Introduction by Example ================================ We will briefly introduce the fundamental concepts of NeuroGraph through self-contained examples. We closely follow the data representation format of `PyG `_. Therefore, interested readers are referred to the `PyG `_ documentation for an introduction to the graph machine learning and PyG's data representation formats. Loading Benchmark datasets ---------------------------------- NeuroGraph provides two classes for loading static and dynamic benchmark datastes. Loading Static Benchmarks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NeuroGraph utilizes the `PyG` `InMemoryDataset` class to facilitate the loading of datasets. this allows an easy-to-use interface for applying graph machine learning pipelines. For example, the `HCPGender` benchmark can be loaded as follows: .. code-block:: python :linenos: from NeuroGraph.datasets import NeuroGraphDataset dataset = NeuroGraphDataset(root="data/", name= "HCPGender") print(dataset.num_classes) print(dataset.num_features) Loading Dynamic Dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To efficiently store and utilize the dynamic datasets in `PyG`` Batch format, we provide the corresponding functionality. Here is an example of loading the `DynHCPGender` dataset: .. code-block:: python :linenos: from NeuroGraph.datasets import NeuroGraphDynamic data_obj = NeuroGraphDynamic(root="data/", name= "DynHCPGender") dataset = data_obj.dataset labels = data_obj.labels print(len(dataset), len(labels)) The dataset is a list of dynamic graphs represented in the `PyG` batch format, making it compatible with graph machine learning pipelines. Preprocessing Examples ==================================== To bridge the gap betwee NeuroGraph and graph machine learning domains, NeuroGraph offers tools to easily preprocess and construct graph-based neuroimaging datasets. Here, we demonstrate how to preprocess your own data to construct functional connectomes and generate corresponding graphs-based representations. .. code-block:: python :linenos: from NeuroGraph import utils fc = utils.preprocess(fmri, regs, n_rois= 1000) # fmri and regs could be numpy arrays The corresponding `Adjacency matrix` and `PyG` data objects can be created from the functional_connectome as follows. .. code-block:: python :linenos: from NeuroGraph import utils adj = utils.construct_adj(fc, threshold= 5) # construct the adjacency matrix data = utils.construct_data(fc, label= 1,threshold = 5) # construct PyG data object We use correlation as node features while constructing data object from functional connectome. The following is the source code for processing one fMRI scan with corresponding regressor using our preprocessing pipeline. .. code-block:: python :linenos: from NeuroGraph import utils import numpy as np from nilearn.image import load_img img = load_img("data/raw/1.nii.gz") # 1.nii.gz is fMRI scan regs = np.loadtxt("data/raw/1.txt") # 1.txt is the movement regressor fmri = img.get_fdata() fc = utils.preprocess(fmri, regs, n_rois= 100) adj = utils.construct_adj(fc, threshold= 5) # construct the adjacency matrix data = utils.construct_data(fc, label = 1,threshold = 5) # construct torch Data object Our preprocessing pipeline consists of five steps and can also be applied seperately in steps. .. code-block:: python :linenos: from NeuroGraph import utils import numpy as np from nilearn.image import load_img img = load_img("data/raw/1.nii.gz") regs = np.loadtxt("data/raw/1.txt") fmri = img.get_fdata() parcells = utils.parcellation(fmri,n_rois = 100) ## this uses schaefer atlas by default Y = utils.remove_drifts(parcells) Y = utils.regress_head_motions(Y,regs) fc = utils.construct_corr(Y) adj = utils.construct_adj(fc, threshold= 5) # construct the adjacency matrix data = utils.construct_data(fc, label = 1,threshold = 5) Preprocessing Human Connectome Project (HCP1200) Dataset ============================================================================== NeuroGraph utilizes the HCP1200 dataset as a primary data source for exploring the dataset generation search space and constructing benchmarks. The HCP1200 dataset can be accessed from the `HCP website `_ by accepting the data usage terms. Additionally, the dataset is also available on an AWS S3 bucket, which can be accessed once authorization has been obtained from HCP. In this section, we provide various functions that allow you to crawl and preprocess the HCP datasets, enabling the construction of graph-based neuroimaging datasets. These functions streamline the process of obtaining and preparing the data for further analysis and modeling. Download and preprocess static datasets --------------------------------------------------- .. code-block:: python :linenos: from NeuroGraph.preprocess import Brain_Connectome_Rest_Download import boto3 root = "data/" name = "HCPGender" threshold = 5 path_to_data = "data/raw/HCPGender" # store the raw downloaded scans n_rois = 100 n_jobs = 5 # this script runs in parallel and requires the number of jobs is an input ACCESS_KEY = '' # your connectomeDB credentials SECRET_KEY = '' s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY) # this function requires both HCP_behavioral.csv and ids.pkl files under the root directory. Both files have been provided and can be found under the data directory rest_dataset = Brain_Connectome_Rest_Download(root,name,n_rois, threshold,path_to_data,n_jobs,s3) The provided function facilitates the download of data from the AWS S3 bucket, performs preprocessing steps, and generates a graph-based dataset. It is important to note that the `rest_dataset` used in this function consists of four labels: gender, age, working memory, and fluid intelligence. To create separate datasets based on these labels, the following functionalities can be used. .. code-block:: python :linenos: from NeuroGraph import preprocess rest_dataset = preprocess.Brain_Connectome_Rest_Download(root,name,n_rois, threshold,path_to_data,n_jobs,s3) gender_dataset = preprocess.Gender_Dataset(root, "HCPGender",rest_dataset) age_dataset = preprocess.Age_Dataset(root, "HCPAge",rest_dataset) wm_datast = preprocess.WM_Dataset(root, "HCPWM",rest_dataset) fi_datast = preprocess.FI_Dataset(root, "HCPFI",rest_dataset) To construct the State dataset, the following functionalities can be used. .. code-block:: python :linenos: from NeuroGraph import preprocess state_dataset = preprocess.Brain_Connectome_State_Download(root, dataset_name,rois, threshold,path_to_data,n_jobs,s3) If you have the data locally, then the following functionalities can be used to preprocess the data. .. code-block:: python :linenos: from NeuroGraph import preprocess rest_dataset = preprocess.Brain_Connectome_Rest(root, name, n_rois, threshold, path_to_data, n_jobs) Similarly, for constructing the State dataset, the following function can be used. .. code-block:: python :linenos: from NeuroGraph import preprocess state_dataset = preprocess.Brain_Connectome_State(root, name, n_rois, threshold, path_to_data, n_jobs) Download and preprocess dynamic datasets --------------------------------------------------- We also offer similar functionalities for constructing dynamic datasets. You can create a dynamic REST dataset from the data stored locally as follows. .. code-block:: python :linenos: from NeuroGraph import preprocess ngd = Dyn_Prep(fmri, regs, n_rois=100, window_size=50, stride=3, dynamic_length=None) dataset = ngd.dataset labels = ngd.labels print(len(dataset),len(labels)) Here the dataset is a list containing dynamic graphs in the form of PyG Batch, which can be easily fed into graph machine learning pipelines. The following examples demonstrate how a dynamic REST dataset can be downloaded and preprocessed on the fly. .. code-block:: python :linenos: from NeuroGraph import preprocess dyn_obj = preporcess.Dyn_Down_Prep(root, name,s3,n_rois = 100, threshold = 10, window_size = 50,stride == 3, dynamic_length=150) dataset = dyn_obj.data_dict Dyn_Down_Prep class downloads and preprocess the rest dataset and provides a dictionary that contains a list of dynamic graphs against each id. The dataset can be further prprocessed as follows to construct each benchmark. .. code-block:: python :linenos: from NeuroGraph import preprocess dyn_obj = preporcess.Dyn_Down_Prep(root, name,s3,n_rois = 100, threshold = 10, window_size = 50,stride == 3, dynamic_length=150) dataset = dyn_obj.data_dict gender_dataset, labels = [],[] for k,v in dataset.items(): if v is None: continue l = v[0].y gender = int(l[0].item()) sub = [] for d in v: new_data = Data(x = d.x, edge_index = d.edge_index, y = gender) sub.append(new_data) batch = Batch.from_data_list(sub) gender_dataset.append(batch) labels.append(gender) print("gender dataset created with {} {} number of instances".format(len(gender_dataset), len(labels))) new_dataset = {'labels':labels, "batches":gender_dataset} age_dataset, labels = [],[] for k,v in dataset.items(): if v is None: continue l = v[0].y age = int(l[1].item()) if age <=2: ### Ignoring subjects with age >=36 sub = [] for d in v: new_data = Data(x = d.x, edge_index = d.edge_index, y = age) sub.append(new_data) batch = Batch.from_data_list(sub) age_dataset.append(batch) labels.append(gender) print("Age dataset created with {} {} number of instances".format(len(age_dataset), len(labels))) new_dataset = {'labels':labels, "batches":age_dataset}