iMSminer package

iMSminer.data_preprocessing module

iMSminer: A Data Processing and Machine Learning Package for Imaging Mass Spectrometry @author: Yu Tin Lin (yutinlin@stanford.edu) @author: Haohui Bao (susanab20020911@gmail.com) @author: Troy R. Scoggins IV (t.scoggins@ufl.edu) @author: Boone M. Prentice (booneprentice@ufl.chem.edu) License: Apache-2.0

class iMSminer.data_preprocessing.Preprocess[source]

Bases: object

Contains functions to import imzML, generate interactive mean mass spectrum, perform peak picking, mass alignment, and peak integration

directorystr

Directory that contains all imzML files to preprocess

data_dirstr

Directory to save preprocessed data

gpubool

True if gpu-accelerated libraries are imported successfully

distint, user input

Minimum number of datapoints for peak separation

loqfloat, user input

Number of times the noise level (k * noise) to define limit of quantification used in peak picking

pp_datasetstr, user input

File name of dataset to perform peak picking on

lwrfloat, user input

Lower m/z bound of a region in spectrum without signals

uprfloat, user input

Upper m/z bound of a region in spectrum without signals

noisefloat

Noise level to guide peak picking

z_scorefloat, user input

Statistical upper threshold for noise computation

RPfloat

Resolving power [FWHM] used to bin spectra

mz_RPfloat

m/z at which RP is calculated

rp_factorfloat, user input

Method binning, factor to scale number of bins; affects mass resolution

resolution_progressstr

Sets resolution for binning if yes

baseline_subtraction()[source]
check_attributes(*args, **kwargs)[source]
get_p2()[source]

Calculates noise level on average spectrum, performs peak picking, and calculates peak widths for peak integration

p2np.1darray

m/z bin indices corresponding to peak-picked maxima

p2_widthnp.1darray

Peak regions computed by np.peak_widths

peak_alignment_func(mz, intensity)[source]

ALigns input intensity array based on peak index or m/z values

mznp.ndarray

m/z array used in alignment

intensitynp.ndarray

Intensity array before alignment

mznp.ndarray

m/z array used in alignment

intensitynp.ndarray

Aligned intesnsity array

peak_pick(percent_RAM: float = 5, pp_method: str = 'point', rel_height: float = 0.9, peak_alignment: bool = False, align_threshold: float = 1, align_halfwidth: int = 100, grid_iter_num: int = 20, align_reduce: bool = False, reduce_halfwidth: int = 200, plot_aligned_peak: bool = True, index_peak_plot: int = 0, plot_num_peaks: int = 10, baseline_subtract: bool = True, baseline_method: str = 'noise')[source]

Perform peak picking to locate signals above a defined LOQ (k * noise) by specifying k, calculating noise, and at specified minimum distance between peaks

percent_RAMint, optional

Percent available RAM occupied by chunk, by default 5

pp_methodstr, optional

Method of computing noise, by default “point” Method point takes specified lower and upper m/z bound of a region in spectrum without signals and compute its standard deviation to define noise level Method specify_noise takes user-specified noise level Method automatic computes standard deviation based on spectral data points with a z-score below a threshold k * z-score, where k is specified by user. Method binning re-bins mass spectra (userful for compressed data with inhomogeneous shapes), then computes noise using method “automatic”

rel_heightfloat

Peak height cutoff for peak integration, by default 0.9

peak_alignmentbool

Performs peak alignment if True, by default False. Peak alignment function refactored from (https://github.com/lukasz-migas/msalign)

align_thresholdfloat

Coefficient to define for peaks for alignment, where peaks above align_threshold*noise are aligned

align_halfwidthint

Half width [data points] to define window for mass alignment around a specified peak

grid_iter_numint

Number of steps to be used in the grid search. Default: 20

align_reducebool

Reduces size of m/z and intensity arrays used in alignment if True, by default False

reduce_halfwidth: int

Half width [data points] to define reduction size of m/z and intensity arrays used in alignment if align_reduce=True, by default 200

plot_aligned_peakbool

Plots a specified peak after alignment if True, by default True

index_peak_plotint

Index of peak to plot if plot_aligned_peak=True, by default 0

plot_num_peaksint

Number of peaks to plot if plot_aligned_peak=True, by deault 10

baseline_subtractbool

Calculates baseline and subtracts all intensities from baseline if baseline_subtract=True

baseline_methodstr

Method of baseline calculation if baseline_subtract=True Method regression defines baseline using polynomial regression of input degree Method noise defines baseline as input coefficient * noise

peak_pick_func()[source]

Performs peak picking using method point, specify_noise, automatic, binning_even or binning_regression, with optional baseline subtraction using method regression or noise

run(percent_RAM: float = 5, peak_alignment: bool = False, integrate_method: str = 'peak_width', align_halfwidth: int = 100, grid_iter_num: int = 20, align_reduce: bool = True, reduce_halfwidth: int = 200, plot_aligned_peak: bool = True, index_peak_plot: int = 0, plot_num_peaks: int = 10)[source]

imports imzML files and perform peak-picking, mass alignment, and peak integration

percent_RAMint, optional

Percent available RAM occupied by chunk, by default 5

peak_alignmentbool, optional, user input

Performs mass alignment on peaks detected by peak picking if True, by default False

align_halfwidthint, user input

Half-width of window for alignment, by default 100

grid_iter_numint, user input

Number of steps by grid search, by default 20. Larger values give more accurate quantification results but computation time increases quadratically

align_reducebool, optional

Reduce the size of intensity matrix passed into alignment if True, by default True

reduce_halfwidthint, user input

Half-width of window around peaks for which intensity matrix is reduced before passing into the mass alignment function if True, by default 200

plot_aligned_peakbool, optional

Render a figure to show peak alignment results if True, by default True

index_peak_plotint, user input

Peak with specified analyte index to visualize if plot_aligned_peak, by default 0

plot_num_peaksint, user input

Number of peaks (spectra) at index_peak_plot to plot if True, by default 10

iMSminer.data_preprocessing.prompt_for_attributes(prompt_func)[source]

Decorator for question prompts

iMSminer.data_preprocessing.prompt_func(self, attr_name)[source]

Question prompts for various attributes for class Preprocess()

iMSminer.data_analysis module

iMSminer: A Data Processing and Machine Learning Package for Imaging Mass Spectrometry @author: Yu Tin Lin (yutinlin@stanford.edu) @author: Haohui Bao (susanab20020911@gmail.com) @author: Troy R. Scoggins IV (t.scoggins@ufl.edu) @author: Boone M. Prentice (booneprentice@ufl.chem.edu) License: Apache-2.0

class iMSminer.data_analysis.DataAnalysis[source]

Bases: object

Performs data analysis on preprocessed intensity matrix and coordinate arrays. Mass alignment function wasa refactored from the python module msalign (https://github.com/lukasz-migas/msalign)

data_dirstr, user input

Path pointing to directory containing preprocessed data

fig_ratiostr, user input

Text:figure ratio for rendered figures from options small, medium, and large

df_pixel_allpd.DataFrame

Dataframe of pixels by peaks with coordinates, ROIs, and replicates with columns mapped to m/z array

mzpd.Series

Series of m/z values for peaks index mapped to columns of df_pixel_all

ROI_infopd.DataFrame

Dataframe of ROI coordinates with ROI label and replicate number

ROI_numint, user input

Number of ROIs in dataset

ROIsstr, user input

ROI annotations from left to right, top to bottom

img_array_1cnp.ndarray

Collection of single-channgled ion images with positions mapped to x,y coordinates

ion_typestr, user input

Ion types of MS1 hits

mass_difffloat, user input

Mass differences of ion types from monoisotopic neutural

MS1_db_pathstr, user input

Local file path of MS1 database

mz_colint, user input

Number denoting column in database corresponding to exact mass of monoisotopic neutral

ms1_dfpd.DataFrame

Dataframe containing a table of MS1 hits with chemical information

analyte_classlist

List of analyte classes contained in column class_col of ms1_df

df_mean_allpd.DataFrame

Dataframe of mean intensities with groups ROIs and replicates

MS1 accurate mass search using database from user

ppm_thresholdint, optional

ppm threshold for MS1 accurate mass hits, by default 5

MS1_search_methodstr, optional

Method for MS1 hits, by default avg_spectrum Method avg_spectrum performs MS1 search against an average spectrum Method multi_spectrum performs MS1 search against a collection of spectra stored in a csv file

filter_dbbool, optional

Filters MS1 database prior to accurate mass search if filter_db = True, by default True

percent_RAMfloat, optional

Percent available RAM to define size of chunking, by deafult 5

calibrate_mz()[source]

Interactive calibration using polynomial regression via linear model of user-specified degree

degreeint, user input

Degree for linear model used in polynomial regression calibration

reference_mz1darray, user input

Reference massess to perform calibration on

calibration_exitstr, user input

Exits calibration if user specifies yes

filter_analytes(method: str = 'MS1')[source]

Subset peak-picked untargeted data to analytes of interest

methodstr, optional

Type of filtering, by default “MS1” Method MS1” subsets untargeted data to MS1 hits Method analyte_class subsets untargeted data to analyte classes from MS1 hits

get_ion_image(replicate: int = 0, show_ROI: bool = True, show_square: bool = True, color_scheme: str = 'inferno', ROI_size_divisor: float = 8, quantile: float = 100)[source]

Render ion image for analytes in self._df_pixel_all (filtered or unfiltered)

replicateint, optional

Render image from replicate (dataset) #, by default 0

show_ROIbool, optional

Display ROI label above redenred ion image, by default True

show_squarebool, optional

Display a green box around selected ROI in rendered ion image, by default False

color_schemestr, optional

False-color scheme for ion image visualization, by default “inferno”. A list of color schemes are available here: https://matplotlib.org/stable/users/explain/colors/colormaps.html

ROI_size_divisorfloat, optional

Controls size of ROI labels, where a smaller divisor gives a larger ROI label, by default 8

quantilefloat, optional

Quantile of intensity (possible values in [0, 100]) for ion image visualization, by default 100

image_clustering(k: int = 10, perplexity: float = 5, replicate: int = 0, show_ROI: bool = True, show_square: bool = True, color_scheme: str = 'inferno', insitu_tsne: bool = False, insitu_perplexity: float = 15, zoom: float = 0.1, quantile: float = 100, img_plot_method: str = 'plot_ROI', feature_label='mz', jitter_amount: float = 2, jitter_factor: float = 100, font_size: float = 15, ROI_linewidth: float = 3, ROI_size_divisor: float = 8)[source]

Group ion images by spatial co-localization. Outputs in situ and box plot visualizations of mean ion image for each cluster. Clustering analysis and in situ mapping of clusters are summarized in lower-dimensional t -SNE embedding.

kint, optional

Number of k-means clusters. The default is 10.

perplexityfloat, optional

t-SNE embedding parameter, which influences tightness of embedded neighbors. The default is 5.

replicateint, optional

Dataset # of which ion images are rendered. The default is 0.

show_ROIbool, optional

Display ROI label above redenred ion image if True. The default is True.

show_squarebool, optional

Display a green box around selected ROI in rendered ion image if True. The default is False.

color_schemestr, optional

False-color scheme for ion image visualization. The default is “inferno”. A list of color schemes are available here: https://matplotlib.org/stable/users/explain/colors/colormaps.html

zoomfloat, optional

Relative size of ion images in t-SNE embedding to the embedding. The default is 0.1.

quantileTYPE, optional

Maximum intensity quantile cutoff for ion image visualization. The default is 100.

img_plot_method: str, optional

Controls layout of rendered in situ heatmaps Method plot_img retains all coordinates from imaging mass spectrometry experiment Method plot_ROI renders heatmaps of ROIs

jitter_amountfloat, optional

Controls placement of feature labels in t-SNE embedding of ion images, by default 2

jitter_factorfloat, optional

Controls the repulsiveness of neighboring feature labels in t-SNE embedding of ion images, by deafult 100

font_sizefloat, optional

Font size of feature labels in volcano plot, by default 15

ROI_line_widthfloat, optional

Controls width of green box surrounding ROIs, by default 3

ROI_size_divisorfloat, optional

Controls size of ROI labels, where a smaller divisor gives a larger ROI label, by default 8

insitu_clustering(k: int = 10, perplexity: float = 15, replicate: int = 0, show_ROI: bool = True, show_square: bool = True, ROI_linewidth: float = 3, ROI_size_divisor: float = 8, insitu_tsne: bool = False)[source]

In situ segmentation via k-means clusterin. Groups pixels by similarity in molecular profiles. Outputs in situ visualization of ROIs colored by cluster labels. In situ segmentation with cluster and ROI annotations are visualized in lower-dimensional t-SNE embedding.

kint, optional

Number of k-means clusters. The default is 10.

perplexityfloat, optional

t-SNE embedding parameter, which influences tightness of embedded neighbors. The default is 15.

replicateint, optional

Dataset # of which ion images are rendered. The default is 0.

show_ROIbool, optional

Display ROI label above redenred ion image if True. The default is True.

show_squarebool, optional

Display a green box around selected ROI in rendered ion image if True. The default is False.

ROI_line_widthfloat, optional

Controls width of green box surrounding ROIs

ROI_size_divisorfloat, optional

Controls size of ROI labels, where a smaller divisor gives a larger ROI label, by default 8

insitu_tsnebool, optional

Renders a RGB in situ representation of 3D t-SNE embedding if insitu_tsne = True, by default False

load_preprocessed_data()[source]

import preprocessed intensity matrix and coordinate arrays, perform ROI annotation and selection, and store information for further data analysis

make_FC_plot(pthreshold: float = 0.05, FCthreshold: float = 1.5, legend_label: str = 'condition', feature_label: str = 'mz', hm_label: str = 'mz', jitter_amount: float = 2, jitter_factor: float = 100, font_size: float = 15, get_hm: bool = True, hm_width_factor: float = 10, hm_height_factor: float = 30, hm_fontsize: float = 10, hm_wspace: float = 1.5)[source]

Generate volcano plots of permuted ROI pairs, showing fold-change statistics and p-values

pthresholdfloat, optional

P-value threshold for statistical significance in volcano plot, by default 0.05

FCthresholdfloat, optional

Absolute fold change threshold for significant dysregulation in volcano plot, by default 1.5

legend_labelstr, optional

Labeling scheme for legend of volcano plot, by default condition Method condition colors data points by expression condition Method analyte_class colors significant data points by class of analyte from MS1 search. Prequisite: DataAnalysis.MS1_search()

feature_labelstr, optional

Labeling scehem for data points in volcano plot, by default mz Method mz labels significant data points by their corresponding m/z values Method analyte labels significant data points by their corresponding analylte IDs.Prequisite: DataAnalysis.MS1_search()

hm_labelstr, optional

Labeling scheme for data points in heatmap, by default mz Method mz visualizes m/z values Method analyte visualizes analyte IDs. Prequisite: DataAnalysis.MS1_search()

jitter_amountfloat, optional

Controls placement of feature label in volcano plot, by default 2

jitter_factorfloat, optional

Controls the repulsiveness of neighboring feature labels, by deafult 100

font_sizefloat, optional

Font size of feature labels in volcano plot, by default 15

get_hmbool, optional

Renders a heatmap of feature label by ROI with entries fold change if get_hm = True, by deafult True

hm_width_factorfloat, optional

Controls width of heatmaps

hm_height_factorfloat, optional

Controls height of heatmap

hm_fontsizefloat, optional

Controls font size in heatmap

hm_wspacefloat ,optional

Controls column spacing between heatmaps

make_boxplot()[source]

Box plot comparison of mean ion intensity across ROIs with pairwise statistical comparison. Statistical significance thresholds are represented as * if 0.05 > $p$-value $geq$ 0.01, ** if 0.01 > $p$-value $geq$ 0.001, and *** if $p$-value $leq$ 0.001.

normalize_pixel(method: str = 'TIC')[source]

Normalize pixels using specified method

methodstr, optional

Method to compute normalization factor, by default “TIC” Method TIC normalizes on total ion count over each pixel Method RMS normalizes on root mean square over each pixel Method reference normalizes on a reference analyte specified by m/z or index Method max normalizes on maximum intensity of each pixel

optimize_image_clustering(k_max: int = 10)[source]

Group ion images by spatial co-localization. Outputs in situ and box plot visualizations of mean ion image for each cluster. Clustering analysis and in situ mapping of clusters are summarized in lower-dimensional t -SNE embedding.

k_maxint, optional

Maximum number of clusters for cluster validity evaluation

optimize_insitu_clustering(k_max: int = 10)[source]

In situ segmentation via k-means clusterin. Groups pixels by similarity in molecular profiles. Outputs in situ visualization of ROIs colored by cluster labels. In situ segmentation with cluster and ROI annotations are visualized in lower-dimensional t-SNE embedding.

k_maxint, optional

Maximum number of clusters for cluster validity evaluation