iMSminer package
iMSminer.data_preprocessing module
iMSminer: A Data Processing and Machine Learning Package for Imaging Mass Spectrometry @author: Yu Tin Lin (yutinlin@stanford.edu) @author: Haohui Bao (susanab20020911@gmail.com) @author: Troy R. Scoggins IV (t.scoggins@ufl.edu) @author: Boone M. Prentice (booneprentice@ufl.chem.edu) License: Apache-2.0
- class iMSminer.data_preprocessing.Preprocess[source]
Bases:
object
Contains functions to import imzML, generate interactive mean mass spectrum, perform peak picking, mass alignment, and peak integration
- directorystr
Directory that contains all imzML files to preprocess
- data_dirstr
Directory to save preprocessed data
- gpubool
True if gpu-accelerated libraries are imported successfully
- distint, user input
Minimum number of datapoints for peak separation
- loqfloat, user input
Number of times the noise level (k * noise) to define limit of quantification used in peak picking
- pp_datasetstr, user input
File name of dataset to perform peak picking on
- lwrfloat, user input
Lower m/z bound of a region in spectrum without signals
- uprfloat, user input
Upper m/z bound of a region in spectrum without signals
- noisefloat
Noise level to guide peak picking
- z_scorefloat, user input
Statistical upper threshold for noise computation
- RPfloat
Resolving power [FWHM] used to bin spectra
- mz_RPfloat
m/z at which RP is calculated
- rp_factorfloat, user input
Method binning, factor to scale number of bins; affects mass resolution
- resolution_progressstr
Sets resolution for binning if yes
- get_p2()[source]
Calculates noise level on average spectrum, performs peak picking, and calculates peak widths for peak integration
- p2np.1darray
m/z bin indices corresponding to peak-picked maxima
- p2_widthnp.1darray
Peak regions computed by np.peak_widths
- peak_alignment_func(mz, intensity)[source]
ALigns input intensity array based on peak index or m/z values
- mznp.ndarray
m/z array used in alignment
- intensitynp.ndarray
Intensity array before alignment
- mznp.ndarray
m/z array used in alignment
- intensitynp.ndarray
Aligned intesnsity array
- peak_pick(percent_RAM: float = 5, pp_method: str = 'point', rel_height: float = 0.9, peak_alignment: bool = False, align_threshold: float = 1, align_halfwidth: int = 100, grid_iter_num: int = 20, align_reduce: bool = False, reduce_halfwidth: int = 200, plot_aligned_peak: bool = True, index_peak_plot: int = 0, plot_num_peaks: int = 10, baseline_subtract: bool = True, baseline_method: str = 'noise')[source]
Perform peak picking to locate signals above a defined LOQ (k * noise) by specifying k, calculating noise, and at specified minimum distance between peaks
- percent_RAMint, optional
Percent available RAM occupied by chunk, by default 5
- pp_methodstr, optional
Method of computing noise, by default “point” Method point takes specified lower and upper m/z bound of a region in spectrum without signals and compute its standard deviation to define noise level Method specify_noise takes user-specified noise level Method automatic computes standard deviation based on spectral data points with a z-score below a threshold k * z-score, where k is specified by user. Method binning re-bins mass spectra (userful for compressed data with inhomogeneous shapes), then computes noise using method “automatic”
- rel_heightfloat
Peak height cutoff for peak integration, by default 0.9
- peak_alignmentbool
Performs peak alignment if True, by default False. Peak alignment function refactored from (https://github.com/lukasz-migas/msalign)
- align_thresholdfloat
Coefficient to define for peaks for alignment, where peaks above align_threshold*noise are aligned
- align_halfwidthint
Half width [data points] to define window for mass alignment around a specified peak
- grid_iter_numint
Number of steps to be used in the grid search. Default: 20
- align_reducebool
Reduces size of m/z and intensity arrays used in alignment if True, by default False
- reduce_halfwidth: int
Half width [data points] to define reduction size of m/z and intensity arrays used in alignment if align_reduce=True, by default 200
- plot_aligned_peakbool
Plots a specified peak after alignment if True, by default True
- index_peak_plotint
Index of peak to plot if plot_aligned_peak=True, by default 0
- plot_num_peaksint
Number of peaks to plot if plot_aligned_peak=True, by deault 10
- baseline_subtractbool
Calculates baseline and subtracts all intensities from baseline if baseline_subtract=True
- baseline_methodstr
Method of baseline calculation if baseline_subtract=True Method regression defines baseline using polynomial regression of input degree Method noise defines baseline as input coefficient * noise
- peak_pick_func()[source]
Performs peak picking using method point, specify_noise, automatic, binning_even or binning_regression, with optional baseline subtraction using method regression or noise
- run(percent_RAM: float = 5, peak_alignment: bool = False, integrate_method: str = 'peak_width', align_halfwidth: int = 100, grid_iter_num: int = 20, align_reduce: bool = True, reduce_halfwidth: int = 200, plot_aligned_peak: bool = True, index_peak_plot: int = 0, plot_num_peaks: int = 10)[source]
imports imzML files and perform peak-picking, mass alignment, and peak integration
- percent_RAMint, optional
Percent available RAM occupied by chunk, by default 5
- peak_alignmentbool, optional, user input
Performs mass alignment on peaks detected by peak picking if True, by default False
- align_halfwidthint, user input
Half-width of window for alignment, by default 100
- grid_iter_numint, user input
Number of steps by grid search, by default 20. Larger values give more accurate quantification results but computation time increases quadratically
- align_reducebool, optional
Reduce the size of intensity matrix passed into alignment if True, by default True
- reduce_halfwidthint, user input
Half-width of window around peaks for which intensity matrix is reduced before passing into the mass alignment function if True, by default 200
- plot_aligned_peakbool, optional
Render a figure to show peak alignment results if True, by default True
- index_peak_plotint, user input
Peak with specified analyte index to visualize if plot_aligned_peak, by default 0
- plot_num_peaksint, user input
Number of peaks (spectra) at index_peak_plot to plot if True, by default 10
iMSminer.data_analysis module
iMSminer: A Data Processing and Machine Learning Package for Imaging Mass Spectrometry @author: Yu Tin Lin (yutinlin@stanford.edu) @author: Haohui Bao (susanab20020911@gmail.com) @author: Troy R. Scoggins IV (t.scoggins@ufl.edu) @author: Boone M. Prentice (booneprentice@ufl.chem.edu) License: Apache-2.0
- class iMSminer.data_analysis.DataAnalysis[source]
Bases:
object
Performs data analysis on preprocessed intensity matrix and coordinate arrays. Mass alignment function wasa refactored from the python module msalign (https://github.com/lukasz-migas/msalign)
- data_dirstr, user input
Path pointing to directory containing preprocessed data
- fig_ratiostr, user input
Text:figure ratio for rendered figures from options small, medium, and large
- df_pixel_allpd.DataFrame
Dataframe of pixels by peaks with coordinates, ROIs, and replicates with columns mapped to m/z array
- mzpd.Series
Series of m/z values for peaks index mapped to columns of df_pixel_all
- ROI_infopd.DataFrame
Dataframe of ROI coordinates with ROI label and replicate number
- ROI_numint, user input
Number of ROIs in dataset
- ROIsstr, user input
ROI annotations from left to right, top to bottom
- img_array_1cnp.ndarray
Collection of single-channgled ion images with positions mapped to x,y coordinates
- ion_typestr, user input
Ion types of MS1 hits
- mass_difffloat, user input
Mass differences of ion types from monoisotopic neutural
- MS1_db_pathstr, user input
Local file path of MS1 database
- mz_colint, user input
Number denoting column in database corresponding to exact mass of monoisotopic neutral
- ms1_dfpd.DataFrame
Dataframe containing a table of MS1 hits with chemical information
- analyte_classlist
List of analyte classes contained in column class_col of ms1_df
- df_mean_allpd.DataFrame
Dataframe of mean intensities with groups ROIs and replicates
- MS1_search(ppm_threshold: float = 5, MS1_search_method: str = 'avg_spectrum', filter_db: bool = True, percent_RAM: float = 5)[source]
MS1 accurate mass search using database from user
- ppm_thresholdint, optional
ppm threshold for MS1 accurate mass hits, by default 5
- MS1_search_methodstr, optional
Method for MS1 hits, by default avg_spectrum Method avg_spectrum performs MS1 search against an average spectrum Method multi_spectrum performs MS1 search against a collection of spectra stored in a csv file
- filter_dbbool, optional
Filters MS1 database prior to accurate mass search if filter_db = True, by default True
- percent_RAMfloat, optional
Percent available RAM to define size of chunking, by deafult 5
- calibrate_mz()[source]
Interactive calibration using polynomial regression via linear model of user-specified degree
- degreeint, user input
Degree for linear model used in polynomial regression calibration
- reference_mz1darray, user input
Reference massess to perform calibration on
- calibration_exitstr, user input
Exits calibration if user specifies yes
- filter_analytes(method: str = 'MS1')[source]
Subset peak-picked untargeted data to analytes of interest
- methodstr, optional
Type of filtering, by default “MS1” Method MS1” subsets untargeted data to MS1 hits Method analyte_class subsets untargeted data to analyte classes from MS1 hits
- get_ion_image(replicate: int = 0, show_ROI: bool = True, show_square: bool = True, color_scheme: str = 'inferno', ROI_size_divisor: float = 8, quantile: float = 100)[source]
Render ion image for analytes in self._df_pixel_all (filtered or unfiltered)
- replicateint, optional
Render image from replicate (dataset) #, by default 0
- show_ROIbool, optional
Display ROI label above redenred ion image, by default True
- show_squarebool, optional
Display a green box around selected ROI in rendered ion image, by default False
- color_schemestr, optional
False-color scheme for ion image visualization, by default “inferno”. A list of color schemes are available here: https://matplotlib.org/stable/users/explain/colors/colormaps.html
- ROI_size_divisorfloat, optional
Controls size of ROI labels, where a smaller divisor gives a larger ROI label, by default 8
- quantilefloat, optional
Quantile of intensity (possible values in [0, 100]) for ion image visualization, by default 100
- image_clustering(k: int = 10, perplexity: float = 5, replicate: int = 0, show_ROI: bool = True, show_square: bool = True, color_scheme: str = 'inferno', insitu_tsne: bool = False, insitu_perplexity: float = 15, zoom: float = 0.1, quantile: float = 100, img_plot_method: str = 'plot_ROI', feature_label='mz', jitter_amount: float = 2, jitter_factor: float = 100, font_size: float = 15, ROI_linewidth: float = 3, ROI_size_divisor: float = 8)[source]
Group ion images by spatial co-localization. Outputs in situ and box plot visualizations of mean ion image for each cluster. Clustering analysis and in situ mapping of clusters are summarized in lower-dimensional t -SNE embedding.
- kint, optional
Number of k-means clusters. The default is 10.
- perplexityfloat, optional
t-SNE embedding parameter, which influences tightness of embedded neighbors. The default is 5.
- replicateint, optional
Dataset # of which ion images are rendered. The default is 0.
- show_ROIbool, optional
Display ROI label above redenred ion image if True. The default is True.
- show_squarebool, optional
Display a green box around selected ROI in rendered ion image if True. The default is False.
- color_schemestr, optional
False-color scheme for ion image visualization. The default is “inferno”. A list of color schemes are available here: https://matplotlib.org/stable/users/explain/colors/colormaps.html
- zoomfloat, optional
Relative size of ion images in t-SNE embedding to the embedding. The default is 0.1.
- quantileTYPE, optional
Maximum intensity quantile cutoff for ion image visualization. The default is 100.
- img_plot_method: str, optional
Controls layout of rendered in situ heatmaps Method plot_img retains all coordinates from imaging mass spectrometry experiment Method plot_ROI renders heatmaps of ROIs
- jitter_amountfloat, optional
Controls placement of feature labels in t-SNE embedding of ion images, by default 2
- jitter_factorfloat, optional
Controls the repulsiveness of neighboring feature labels in t-SNE embedding of ion images, by deafult 100
- font_sizefloat, optional
Font size of feature labels in volcano plot, by default 15
- ROI_line_widthfloat, optional
Controls width of green box surrounding ROIs, by default 3
- ROI_size_divisorfloat, optional
Controls size of ROI labels, where a smaller divisor gives a larger ROI label, by default 8
- insitu_clustering(k: int = 10, perplexity: float = 15, replicate: int = 0, show_ROI: bool = True, show_square: bool = True, ROI_linewidth: float = 3, ROI_size_divisor: float = 8, insitu_tsne: bool = False)[source]
In situ segmentation via k-means clusterin. Groups pixels by similarity in molecular profiles. Outputs in situ visualization of ROIs colored by cluster labels. In situ segmentation with cluster and ROI annotations are visualized in lower-dimensional t-SNE embedding.
- kint, optional
Number of k-means clusters. The default is 10.
- perplexityfloat, optional
t-SNE embedding parameter, which influences tightness of embedded neighbors. The default is 15.
- replicateint, optional
Dataset # of which ion images are rendered. The default is 0.
- show_ROIbool, optional
Display ROI label above redenred ion image if True. The default is True.
- show_squarebool, optional
Display a green box around selected ROI in rendered ion image if True. The default is False.
- ROI_line_widthfloat, optional
Controls width of green box surrounding ROIs
- ROI_size_divisorfloat, optional
Controls size of ROI labels, where a smaller divisor gives a larger ROI label, by default 8
- insitu_tsnebool, optional
Renders a RGB in situ representation of 3D t-SNE embedding if insitu_tsne = True, by default False
- load_preprocessed_data()[source]
import preprocessed intensity matrix and coordinate arrays, perform ROI annotation and selection, and store information for further data analysis
- make_FC_plot(pthreshold: float = 0.05, FCthreshold: float = 1.5, legend_label: str = 'condition', feature_label: str = 'mz', hm_label: str = 'mz', jitter_amount: float = 2, jitter_factor: float = 100, font_size: float = 15, get_hm: bool = True, hm_width_factor: float = 10, hm_height_factor: float = 30, hm_fontsize: float = 10, hm_wspace: float = 1.5)[source]
Generate volcano plots of permuted ROI pairs, showing fold-change statistics and p-values
- pthresholdfloat, optional
P-value threshold for statistical significance in volcano plot, by default 0.05
- FCthresholdfloat, optional
Absolute fold change threshold for significant dysregulation in volcano plot, by default 1.5
- legend_labelstr, optional
Labeling scheme for legend of volcano plot, by default condition Method condition colors data points by expression condition Method analyte_class colors significant data points by class of analyte from MS1 search. Prequisite: DataAnalysis.MS1_search()
- feature_labelstr, optional
Labeling scehem for data points in volcano plot, by default mz Method mz labels significant data points by their corresponding m/z values Method analyte labels significant data points by their corresponding analylte IDs.Prequisite: DataAnalysis.MS1_search()
- hm_labelstr, optional
Labeling scheme for data points in heatmap, by default mz Method mz visualizes m/z values Method analyte visualizes analyte IDs. Prequisite: DataAnalysis.MS1_search()
- jitter_amountfloat, optional
Controls placement of feature label in volcano plot, by default 2
- jitter_factorfloat, optional
Controls the repulsiveness of neighboring feature labels, by deafult 100
- font_sizefloat, optional
Font size of feature labels in volcano plot, by default 15
- get_hmbool, optional
Renders a heatmap of feature label by ROI with entries fold change if get_hm = True, by deafult True
- hm_width_factorfloat, optional
Controls width of heatmaps
- hm_height_factorfloat, optional
Controls height of heatmap
- hm_fontsizefloat, optional
Controls font size in heatmap
- hm_wspacefloat ,optional
Controls column spacing between heatmaps
- make_boxplot()[source]
Box plot comparison of mean ion intensity across ROIs with pairwise statistical comparison. Statistical significance thresholds are represented as * if 0.05 > $p$-value $geq$ 0.01, ** if 0.01 > $p$-value $geq$ 0.001, and *** if $p$-value $leq$ 0.001.
- normalize_pixel(method: str = 'TIC')[source]
Normalize pixels using specified method
- methodstr, optional
Method to compute normalization factor, by default “TIC” Method TIC normalizes on total ion count over each pixel Method RMS normalizes on root mean square over each pixel Method reference normalizes on a reference analyte specified by m/z or index Method max normalizes on maximum intensity of each pixel
- optimize_image_clustering(k_max: int = 10)[source]
Group ion images by spatial co-localization. Outputs in situ and box plot visualizations of mean ion image for each cluster. Clustering analysis and in situ mapping of clusters are summarized in lower-dimensional t -SNE embedding.
- k_maxint, optional
Maximum number of clusters for cluster validity evaluation
- optimize_insitu_clustering(k_max: int = 10)[source]
In situ segmentation via k-means clusterin. Groups pixels by similarity in molecular profiles. Outputs in situ visualization of ROIs colored by cluster labels. In situ segmentation with cluster and ROI annotations are visualized in lower-dimensional t-SNE embedding.
- k_maxint, optional
Maximum number of clusters for cluster validity evaluation