SciPy

pynibs.regression package

This holds methods for the TMS-based cortical localization approach as published in [1]

References

[1]

Numssen, O., Zier, A. L., Thielscher, A., Hartwigsen, G., Knösche, T. R., & Weise, K. (2021). Efficient high-resolution TMS mapping of the human motor cortex by nonlinear regression. NeuroImage, 245, 118654.

Submodules

pynibs.regression.regression module

class pynibs.regression.regression.Element(x, y, ele_id, fun=<function sigmoid4>, score_type='R2', select_signed_data=False, constants=None, **kwargs)

Bases: object

Fit Element object class

calc_score()

Determine goodness-of-fit score.

run_fit(max_nfev=1000)

Perform data fit with lmfit.

run_select_signed_data()

Selects positive or negative data by performing an initial linear fit by comparing the resulting p-values, slopes and R2 values. Either positive or negative data (w.r.t. x-axis) yielding a fit with a p-value < 0.05, a positive slope and the higher R2 value is used and the remaining data with the other sign is omitted from the analysis

set_constants(value)

Sets constants in self.constants and gmodel instance

set_init_vals(value)

Sets initial values in self.init_vals and gmodel instance

set_limits(value)

Sets limits in self.limits and gmodel instance

Parameters:

value (dict) – Parameters (keys) to set in self as limits.

set_random_init_vals()

Set random initial values

setup_model()

Setup model parameters (limits, initial values, etc. …)

pynibs.regression.regression.fit_elms(elm_idx_list, e_matrix, mep, zap_idx=None, fun=<function sigmoid4>, init_vals=None, limits=None, log_scale=False, constants=None, max_nfev=None, bad_elm_idx=None, score_type='R2', verbose=False)

Workhorse for Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise r2 score is returned.

Parameters:
  • elm_idx_list (list of int or np.ndarray) – List containing the element indices the fit is performed for

  • e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix

  • mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation

  • zap_idx (np.ndarray [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)

  • fun (str) – A function name of pynibs.exp.Mep (exp0, sigmoid)

  • init_vals (np.ndarray of dict) – Dictionary containing the initial values for each element as np.ndarray [len(elm_idx_list)]. The keys are the free parameters of fun, e.g. “x0”, “amp”, etc

  • limits (pd.DataFrame) – Dictionary containing the limits of each parameter for each element e.g.: limits[“x0”][elm_idx] = [min, max]

  • log_scale (bool, default: False) – Log-transform data before fit (necessary for functions defined in the log domain)

  • constants (dict of <string>:<num>, default: None) – key:value pair of model parameters not to optimize.

  • max_nfev (int, default: None) – Max fits, passed to model.fit() as max_nfev=max_nfev*len(x).

  • bad_elm_idx (np.ndarray) – Indices of elements not to fit, with indices corresponding to indices (not values) of elm_idx_list

  • score_type (str, default: "R2") – Goodness of fit measure; Choose SR for nonlinear fits and R2 or SR for linear fits: * “R2”: R2 score (Model variance / Total variance) [0, 1] for linear models; 0: bad fit; 1: perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm) [-inf, 1]; 1: perfect fit

  • verbose (bool, default: False) – Print verbosity information

Returns:

  • r2 (np.ndarray of float [n_roi, 1]) – R2 for each element in elm_idx_list

  • best_values (np.ndarray of object) – Fit parameters returned from the optimizer

pynibs.regression.regression.get_bad_elms(x, y, method='lstsq', verbose=False)

This does an element-wise fast linear regression fit to identify bad elements. Bad is defined here as a negative slope.

xnp.ndarray of float [n_zaps x n_ele]

Electric field matrix

ynp.ndarray of float [n_zaps]

Motor evoked potentials for every stimulation

methodstr, default: “lstsq”

Which method to use. (numpy.linalg.)lstsq, (scipy.stats.)linregress, or pinv

verbosebool, default: False

Print verbosity information

Returns:

idx – Indices of bad elements

Return type:

np.ndarray

pynibs.regression.regression.get_model_init_values(fun, elm_idx_list, e_matrix, mep, mask_e_field=None, rem_empty_hints=True)

Calc appropriate init, limit, and max values for models fits depending on the data. If negative and positive x-data is present in case of e.g. normal component values are set according to the side (positive or negative) where more values are present. When more positive x-axis values are present, negative x-axis values will be ignored. When more negative x-axis values are present, the absolute values will be taken and the positive values are ignored.

Only parameters for sigmoid* are optimized.

Parameters:
  • fun (pyfempp.exp.Mep) – IO curve function object

  • elm_idx_list (np.ndarray of int) – Array containing the element indices the fit is performed for

  • e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix

  • mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation

  • mask_e_field (np.ndarray of bool [n_zaps x n_ele], default: None) – Mask indicating for which e-field (and mep) values the fit is performed for. Changes for normal component in each element because of the sign and p-values. If None, all data is used in each element.

  • rem_empty_hints (bool, default: True) – Remove any non-filled param hints from limits dict.

Returns:

  • log_scale (bool) – Log scale

  • limits (dict of list [n elm_index_list]) – Element-wise limit values for function fitting.

  • init_vals (dict of list [n elm_index_list]) – Element-wise init values for function fitting.

  • max_vals_refit (dict of list [n elm_index_list]) – Element-wise perturbation range for refitting function.

pynibs.regression.regression.init(l, zap_lists, res_fn)

Pool init function to use with regression_nl_hdf5_single_core_write

Parameters:
pynibs.regression.regression.logistic_regression()

Some ideas on how to improve regression approach

  1. De-log data

Data range has to be transformed to a reasonable range. For a full sigmoid, -10:10 looks ok

sig <- function(z) {
  return( 1 / (1 + exp(-z)))
}
desig <- function(x) {
  return(- log((1/x) - 1))
}

This might be a reasonable fast approach, but the parameter range has to be estimated. Maybe remove some outliters?

2. fit logistic regression to raw data scipy.optimize provides fit_curve(), which does OLS-ish fitting to a given function https://stackoverflow.com/questions/54376900/fit-sigmoid-curve-in-python

I expect this to be rather slow.

3. Use the sklearn logistic_regression classifyer and access raw fit data The logistic_regression is implemented as a classifyer, maybe it’s possible to use its regression fit results. Implementation should be pretty fast.

pynibs.regression.regression.nl_hdf5(elm_idx_list=None, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, zap_idx=None, n_cpu=4, con=None, n_refit=50, return_fits=False, score_type='R2', verbose=False, pool=None, refit_discontinuities=True)

Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise r2 score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file.

Parameters:
  • elm_idx_list (np.ndarray of int) – List containing the element indices the fit is performed for

  • fn_reg_hdf5 (str) – Filename (incl. path) containing the precomputed E-MEP dataframes

  • qoi_path_hdf5 (Union[str, list[str]]) – Path in .hdf5 file to dataset of electric field qoi e.g.: [“E”, “E_norm”, “E_tan”]

  • e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix

  • mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation

  • zap_idx (np.array [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)

  • fun (pynibs.exp.Mep function) – A function of pynibs.exp.Mep (exp0, sigmoid)

  • n_cpu (int) – Number of threads

  • con (np.ndarray of float [n_roi, 3 or 4], default: None) – Connectivity matrix of ROI (needed in case of refit because of discontinuity check)

  • n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.

  • return_fits (bool, default: False) – Return fit objects containing the parameter estimates

  • score_type (str, default: "R2") – Error measure of fit: * “R2”: R2 score (Model variance / Total variance); linear fits: [0, 1], 1 … perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm); [-Inf, 1], 1 … perfect fit

  • verbose (bool, default: False) – Plot output messages

  • pool (multiprocessing.Pool()) – pool instance to use.

  • refit_discontinuities (bool, default: True) – Run refit for discontinuous elements at the end

Returns:

r2 – R2 for each element in elm_idx_list

Return type:

np.ndarray of float [n_roi, n_qoi]

pynibs.regression.regression.nl_hdf5_single_core(zap_idx, elm_idx_list, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, con=None, n_refit=50, return_fits=False, constants=None, verbose=False, seed=None, rem_bad_elms=True, return_e_field_stats=True)

Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise r2 score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file.

Parameters:
  • elm_idx_list (np.ndarray of int) – List containing the element indices the fit is performed for

  • fn_reg_hdf5 (str) – Filename (incl. path) containing the precomputed E-MEP dataframes

  • qoi_path_hdf5 (str or list of str) – Path in .hdf5 file to dataset of electric field qoi e.g.: [“E”, “E_norm”, “E_tan”]

  • e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix

  • mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation

  • zap_idx (np.array [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)

  • fun (function object) – A function of pynibs.exp.Mep (exp0, sigmoid)

  • con (np.ndarray of float [n_roi, 3 or 4], default: None) – Connectivity matrix of ROI (needed in case of refit because of discontinuity check)

  • n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.

  • return_fits (bool, default: False) – Return fit objects containing the parameter estimates

  • constants (dict of <string>:<num>, default: None) – key:value pair of model parameters not to optimize.

  • verbose (bool, default: False) – Plot output messages

  • seed (int, default: None) – Seed to use.

  • rem_bad_elms (bool, default: True) – Remove elements based on a fast linear regression slope estimation

  • return_e_field_stats (bool, default: True) – Return some stats on the efield variance

Returns:

r2 – R2 for each element in elm_idx_list

Return type:

np.ndarray of float [n_roi, n_qoi]

pynibs.regression.regression.nl_hdf5_single_core_write(i, elm_idx_list, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, con=None, n_refit=50, return_fits=False, constants=None, verbose=False, seed=None, stepdown=False, score_type='R2', return_progress=False, geo=None)
pynibs.regression.regression.regress_data(e_matrix, mep, elm_idx_list=None, element_list=None, fun=<function sigmoid4>, n_cpu=4, con=None, n_refit=50, zap_idx=None, return_fits=False, score_type='R2', verbose=False, pool=None, refit_discontinuities=True, select_signed_data=False, mp_context='fork', **kwargs)

Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise R2 score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file.

Parameters:
  • e_matrix (np.ndarray of float) – (n_zaps, n_ele) Electric field matrix

  • mep (np.ndarray of float) – (n_zaps,) Motor evoked potential for each stimulation

  • elm_idx_list (np.ndarray of int or list of int) – (n_zaps,) List containing the element indices the fit is performed for

  • element_list (list of Element object instances, optional) – [n_ele] pynibs.Element objects ot skip initialization here.

  • fun (pynibs.exp.Mep, default: pynibs.sigmoid4) – A pynibs.exp.Mep function (exp0, sigmoid, sigmoid4, …)

  • n_cpu (int) – Number of threads, if n_cpu=1 no parallel pool will be opened and all calculations are done in serial

  • con (np.ndarray of float, optional) – (n_ele, 3 or 4) Connectivity matrix of ROI. Needed in case of refit for discontinuity checks

  • n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.

  • zap_idx (np.ndarray of int or list of int, optional) – Which e/mep pairs to use.

  • return_fits (bool, optional) – Return fit objects containing the parameter estimates

  • score_type (str, default: "R2") – Error measure of fit: * “R2”: R2 score (Model variance / Total variance); linear fits: [0, 1], 1 … perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm); [-Inf, 1], 1 … perfect fit * “rho”: Spearman correlation coefficient [-1, 1]; finds any monotonous correlation (0 means no correlation)

  • verbose (bool, default: False) – Plot output messages

  • pool (multiprocessing.Pool()) – pool instance to use

  • refit_discontinuities (bool, default: True) – Refit discontinuous elements. If True, provide _con_

  • mp_context (str, default: "fork") –

    Controls the method the sub-processes of the multiprocessing pool (in case of n_cpu > 1) are launched. * fork: (only supported by Unix) mp processes diverge from the main process,

    the entire stack, variables and other resources are copied over. From the docs: “The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.”

    • spawn: (supported by Window and Unix) mp processes are launched in an entirely new Python interpreter as separate processes. Variables are copied other resources are freshly instantiated. From the docs: “In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.”

  • **kwargs – Passed on to pynibs.Element() to set fit parameters.

Returns:

  • score (np.ndarray of float (n_roi, n_qoi)) – Score for each element.

  • best_values (list of dict (n_ele)) – List of parameter fits, only return if return_fits=True.

pynibs.regression.regression.ridge_from_hdf5(elm_idx_list, fn_reg_hdf5, qoi_path_hdf5, zap_idx=None)

Mass-univariate ridge regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise sklearn.metrics.regression.r2_score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file. Always uses all cores of a machine!

elm_idx_listlist of int

List containing the element indices the fit is performed for

fn_hdf5str

Filename (incl. path) containing the precomputed E-MEP dataframes

qoi_path_hdf5str

Path in .hdf5 file to dataset of electric field qoi

zap_idxnp.ndarray [n_zaps], default: None

Indices of zaps the congruence factor is calculated with (default: all)

Returns:

r2 – R^2 for each element in elm_idx_list

Return type:

np.ndarray of float [n_roi, n_datasets]

pynibs.regression.regression.sing_elm_fitted(elm_idx_list, mep_lst, mep_params, e, alpha=1000, n_samples=100)

Mass-univariate ridge regressions on fitted MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise sklearn.metrics.regression.r2_score is returned.

elm_idx_list: np.ndarray [chunksize]

List of element indices, the congruence factor is computed for

mep_lst: list of Mep object instances [n_cond]

List of fitted Mep object instances for all conditions (see exp.py for more information of Mep class)

mep_params: np.ndarray of float [n_mep_params_total]

List of all mep parameters of curve fits used to calculate the MEP (accumulated into one array) (e.g.: [mep_#1_para_#1, mep_#1_para_#2, mep_#1_para_#3, mep_#2_para_#1, mep_#2_para_#1, …])

e: np.ndarray of float with e.shape == (n_elm, n_cond, n_qoi)

array of the electric field to compute the r2 factor for, e.g. (e_mag, e_norm, e_tan)

n_samplesint, default=100

Number of data points to generate discrete mep and e curves

Returns:

r2 – R^2 for each element in elm_idx_list

Return type:

np.ndarray of float [n_roi, n_datasets]

pynibs.regression.regression.sing_elm_raw(elm_idx_list, mep_lst, mep_params, e, alpha=1000)

Mass-univariate ridge regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise sklearn.metrics.regression.r2_score is returned.

elm_idx_list: np.ndarray [chunksize]

List of element indices, the congruence factor is computed for

mep: list of Mep object instances [n_cond]

List of fitted Mep object instances for all conditions (see exp.py for more information of Mep class)

mep_params: np.ndarray of float [n_mep_params_total]

List of all mep parameters of curve fits used to calculate the MEP (accumulated into one array) (e.g.: [mep_#1_para_#1, mep_#1_para_#2, mep_#1_para_#3, mep_#2_para_#1, mep_#2_para_#1, …])

e: np.ndarray of float with e.shape == (n_elm, n_cond, n_qoi)

array of the electric field to compute the r2 factor for, e.g. (e_mag, e_norm, e_tan)

Returns:

r2 – R^2 for each element in elm_idx_list

Return type:

np.ndarray of float [n_roi, n_datasets]

pynibs.regression.regression.single_fit(x, y, fun)

Performs a single fit and returns fit object.

Parameters:
  • x (ndarray of float) – x-values.

  • y (ndarray of float) – y-values.

  • fun (function) – Function for fitting.

Returns:

fit – Fit object.

Return type:

gmodel fit object

pynibs.regression.regression.stepdown_approach(zap_idx, elm_idx_list, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, con=None, n_refit=50, return_fits=False, constants=None, verbose=False, seed=None, rem_bad_elms=True, return_e_field_stats=True, score_type='R2', return_progress=False, smooth_data=True, geo=None)

Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E in a stepdown manner to speed up computation.

Initially, one set of fits is done for the complete dataset. Afterwards, the best 1% of the elements are used as initial fitting parameters for their neighboring elements. Then, neighboring elements are fitted accordingly and iterativley. Finally, discontinuous elements are refitted until a smooth map is found or n_refit is hit.

Can be sped up with rem_bad_elms that computes a fast linear fit to identify elements with a negative slope.

The function reads the precomputed array of E-MEP data from an .hdf5 file.

Parameters:
  • elm_idx_list (np.ndarray of int) – List containing the element indices the fit is performed for

  • fn_reg_hdf5 (str) – Filename (incl. path) containing the precomputed E-MEP dataframes

  • qoi_path_hdf5 (str or list of str) – Path in .hdf5 file to dataset of electric field qoi e.g.: [“E”, “E_norm”, “E_tan”]

  • e_matrix (np.ndarray of float [n_zaps x n_ele] | [n_zapidx x n_ele]) – Electric field matrix

  • mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation

  • zap_idx (np.array [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)

  • fun (function object) – A function of pynibs.exp.Mep (exp0, sigmoid)

  • con (np.ndarray of float [n_elm_roi, 3 or 4], default: None) – Connectivity matrix of ROI (needed in case of refit because of discontinuity check)

  • n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.

  • return_fits (bool, default: False) – Return fit objects containing the parameter estimates

  • constants (dict of <string>:<num>, default: None) – key:value pair of model parameters not to optimize.

  • verbose (bool, default: False) – Plot output messages

  • seed (int, default: None) – Seed to use.

  • rem_bad_elms (bool, default: True) – Remove elements based on a fast linear regression slope estimation

  • return_e_field_stats (bool, default: True) – Return some stats on the efield variance

  • score_type (str, default: "R2") – Error measure of fit: * “R2”: R2 score (Model variance / Total variance); linear fits: [0, 1], 1 … perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm); [-Inf, 1], 1 … perfect fit * “rho”: Spearman correlation coefficient [-1, 1]; finds any monotonous correlation (0 means no correlation)

  • return_progress (bool, default: False) – Return c maps for all steps to allow visualization over e-fitting over timesteps.

  • smooth_data (bool, default: False) – Smooth c-map as final step.

  • geo (object, optional) – Geometriy data.

Returns:

r2np.ndarray of float [n_roi, n_qoi]

R2 for each element in elm_idx_list

best_values: list of dict, optional

fit information

statsdict, optional
’mc’: float, optional

Mutual coherence for e fields

’sv_rat’float, optional

SVD singular value ratio

progress : cmaps for each step

Return type:

dict

pynibs.regression.regression.workhorse_element_init(ele_id, e_matrix, mep, fun, score_type, select_signed_data, constants, **kwargs)

Workhorse to initialize Elements.

pynibs.regression.regression.workhorse_element_run_fit(element, max_nfev=10)

Workhorse to run single Element fit. If status is False, the element will not be fitted.

pynibs.regression.regression.write_regression_hdf5(fn_exp_hdf5, fn_reg_hdf5, qoi_path_hdf5, qoi_phys, e_results_folder, qoi_e, roi_idx, conds_exclude)

Reads the stimulation intensities from the experiment.hdf5 file. Reads the qoi from the experiment.hdf5 file. Reads the electric fields from the electric field folder. Weights the electric field voxel wise with the respective intensities and writes an .hdf5 file containing the preprocessed data (pandas dataframe).

fn_exp_hdf5: str

Filename of the experiment.hdf5 file

fn_reg_hdf5: str

Filename of output regression.hdf5 file

qoi_path_hdf5: str

Path in experiment.hdf5 file pointing to the pandas dataframe containing the qoi (e.g.: “phys_data/postproc/EMG”)

qoi: str

Name of QOI the congruence factor is calculated with (e.g.: “p2p”)

fn_e_results: str

Folder containing the electric fields (e.g.: “/data/pt_01756/probands/13061.30/results/electric_field/1”)

qoi_e: str or list of str

Quantities of the electric field used to calculate the congruence factor (e.g. [“E”, “E_norm”, “E_tan”] Has to be included in e.hdf5 -> e.g.: “data/midlayer/roi_surface/1/E”

roi_idx: int

ROI index

conds_exclude: str or list of str

Conditions to exclude

Returns:

<File> – File containing the intensity (current) scaled E-fields of the conditions in the ROI. Saved in datasets with the same name as qoi_e [“E”, “E_norm”, “E_tan”]

Return type:

.hdf5 file

pynibs.regression.score_types module