Preprocessing methods

Preprocessing.Preprocessing_AMB.Preprocessing_AMB(DataPath, LabelPath)
Preprocessing function for the Allen Mouse Brain dataset

Cell populations with less than 10 members are filtered out and the labels are converted to the correct format for hierarchical classification.

Parameters:
  • DataPath (str) – Local path to the AMB dataset (csv file)

  • LabelPath (str) – Local path to the labels of AMB dataset (csv file)

Returns:

  • Data (pandas dataframe)

  • Labels (list)

Preprocessing.Preprocessing_Azimuth.Preprocessing_Azimuth_PBMC(h5file)
Preprocessing function for the Azimuth PBMC dataset.

The hierarchical information in the labels is completed and formatted for hierarchical classification. The intermediate and unspecified labels are removed and the cell populations with less than 10 members are discarded.

Parameters:

h5file (str) – Local path to Azimuth PBMC dataset (hdf5 file)

Returns:

  • Data (sparse matrix)

  • Labels (list)

Preprocessing.Preprocessing_COVID.Preprocessing_COVID(LabelPath, DataPath, filter_prolif=True, filter_unspecified=False)
Preprocessing function for the COVID dataset

Cell populations with less than 10 members are filtered out together with unspecified and proliferating cells and the labels are converted to the correct format for hierarchical classification.

Parameters:
  • LabelPath (str) – Local path to the labels of COVID dataset (csv file)

  • DataPath (str) – Local path to the COVID dataset (csv file)

  • filter_prolif (Boolean, optional) – Filter out the proliferating cell state labels. The default is True.

  • filter_unspecified (Boolean, optional) – Filter out the unspecified labels. The default is False.

Returns:

  • Data (pandas dataframe)

  • Labels (pandas dataframe (column 3 contains the input labels for hierarchical classification))

Preprocessing.Preprocessing_Flyatlas.Preprocessing_Flyatlas_head(DataPath, LabelPath, FBBT_dfPath)
Preprocessing function for the Flyhead dataset

The hierarchical information is formatted and cell populations with less than 10 members are discarded.

Note: uncomment the rpy2 import statements

Parameters:
  • DataPath (str) – Local path to the Flyhead dataset (loom file)

  • LabelPath (str) – Local path to the labels of the Flyhead dataset (csv file)

  • FBBT_dfPath (str) – Local path to the FBbt ontology terms linked to the labels of the Flyhead dataset (csv file)

Returns:

  • Data (pandas dataframe)

  • Labels (list)

Preprocessing.Preprocessing_Flyatlas.Preprocessing_Flyatlas_body(DataPath, LabelPath, FBBT_dfPath)
Preprocessing function for the Flybody dataset

The hierarchical information is formatted and cell populations with less than 10 members are discarded.

Parameters:
  • DataPath (str) – Local path to the Flybody dataset (loom file)

  • LabelPath (str) – Local path to the labels of the Flybody dataset (csv file)

  • FBBT_dfPath (str) – Local path to the FBbt ontology terms linked to the labels of the Flybody dataset (csv file)

Returns:

  • Data (pandas dataframe)

  • Labels (list)

Small note: I disabled the rpy2 imports in the Preprocessing_Flyatlas file because of readthedocs inability to handle R dependencies. These should be uncommented when running the functions!b: