pynibs.regression package¶
This holds methods for the TMS-based cortical localization approach as published in [1]
References
Numssen, O., Zier, A. L., Thielscher, A., Hartwigsen, G., Knösche, T. R., & Weise, K. (2021). Efficient high-resolution TMS mapping of the human motor cortex by nonlinear regression. NeuroImage, 245, 118654.
Submodules¶
pynibs.regression.regression module¶
- class pynibs.regression.regression.Element(x, y, ele_id, fun=<function sigmoid4>, score_type='R2', select_signed_data=False, constants=None, **kwargs)¶
Bases:
object
Fit Element object class
- calc_score()¶
Determine goodness-of-fit score.
- run_fit(max_nfev=1000)¶
Perform data fit with lmfit.
- run_select_signed_data()¶
Selects positive or negative data by performing an initial linear fit by comparing the resulting p-values, slopes and R2 values. Either positive or negative data (w.r.t. x-axis) yielding a fit with a p-value < 0.05, a positive slope and the higher R2 value is used and the remaining data with the other sign is omitted from the analysis
- set_constants(value)¶
Sets constants in self.constants and gmodel instance
- set_init_vals(value)¶
Sets initial values in self.init_vals and gmodel instance
- set_limits(value)¶
Sets limits in self.limits and gmodel instance
- Parameters:
value (dict) – Parameters (keys) to set in self as limits.
- set_random_init_vals()¶
Set random initial values
- setup_model()¶
Setup model parameters (limits, initial values, etc. …)
- pynibs.regression.regression.fit_elms(elm_idx_list, e_matrix, mep, zap_idx=None, fun=<function sigmoid4>, init_vals=None, limits=None, log_scale=False, constants=None, max_nfev=None, bad_elm_idx=None, score_type='R2', verbose=False)¶
Workhorse for Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise r2 score is returned.
- Parameters:
elm_idx_list (list of int or np.ndarray) – List containing the element indices the fit is performed for
e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix
mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation
zap_idx (np.ndarray [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)
fun (str) – A function name of pynibs.exp.Mep (exp0, sigmoid)
init_vals (np.ndarray of dict) – Dictionary containing the initial values for each element as np.ndarray [len(elm_idx_list)]. The keys are the free parameters of fun, e.g. “x0”, “amp”, etc
limits (pd.DataFrame) – Dictionary containing the limits of each parameter for each element e.g.: limits[“x0”][elm_idx] = [min, max]
log_scale (bool, default: False) – Log-transform data before fit (necessary for functions defined in the log domain)
constants (dict of <string>:<num>, default: None) – key:value pair of model parameters not to optimize.
max_nfev (int, default: None) – Max fits, passed to model.fit() as max_nfev=max_nfev*len(x).
bad_elm_idx (np.ndarray) – Indices of elements not to fit, with indices corresponding to indices (not values) of elm_idx_list
score_type (str, default: "R2") – Goodness of fit measure; Choose SR for nonlinear fits and R2 or SR for linear fits: * “R2”: R2 score (Model variance / Total variance) [0, 1] for linear models; 0: bad fit; 1: perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm) [-inf, 1]; 1: perfect fit
verbose (bool, default: False) – Print verbosity information
- Returns:
r2 (np.ndarray of float [n_roi, 1]) – R2 for each element in elm_idx_list
best_values (np.ndarray of object) – Fit parameters returned from the optimizer
- pynibs.regression.regression.get_bad_elms(x, y, method='lstsq', verbose=False)¶
This does an element-wise fast linear regression fit to identify bad elements. Bad is defined here as a negative slope.
- xnp.ndarray of float [n_zaps x n_ele]
Electric field matrix
- ynp.ndarray of float [n_zaps]
Motor evoked potentials for every stimulation
- methodstr, default: “lstsq”
Which method to use. (numpy.linalg.)lstsq, (scipy.stats.)linregress, or pinv
- verbosebool, default: False
Print verbosity information
- Returns:
idx – Indices of bad elements
- Return type:
np.ndarray
- pynibs.regression.regression.get_model_init_values(fun, elm_idx_list, e_matrix, mep, mask_e_field=None, rem_empty_hints=True)¶
Calc appropriate init, limit, and max values for models fits depending on the data. If negative and positive x-data is present in case of e.g. normal component values are set according to the side (positive or negative) where more values are present. When more positive x-axis values are present, negative x-axis values will be ignored. When more negative x-axis values are present, the absolute values will be taken and the positive values are ignored.
Only parameters for sigmoid* are optimized.
- Parameters:
fun (pyfempp.exp.Mep) – IO curve function object
elm_idx_list (np.ndarray of int) – Array containing the element indices the fit is performed for
e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix
mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation
mask_e_field (np.ndarray of bool [n_zaps x n_ele], default: None) – Mask indicating for which e-field (and mep) values the fit is performed for. Changes for normal component in each element because of the sign and p-values. If None, all data is used in each element.
rem_empty_hints (bool, default: True) – Remove any non-filled param hints from limits dict.
- Returns:
log_scale (bool) – Log scale
limits (dict of list [n elm_index_list]) – Element-wise limit values for function fitting.
init_vals (dict of list [n elm_index_list]) – Element-wise init values for function fitting.
max_vals_refit (dict of list [n elm_index_list]) – Element-wise perturbation range for refitting function.
- pynibs.regression.regression.init(l, zap_lists, res_fn)¶
Pool init function to use with regression_nl_hdf5_single_core_write
- Parameters:
l (multiprocessing.Lock()) –
res_fn (str) – .hdf5 fn
- pynibs.regression.regression.logistic_regression()¶
Some ideas on how to improve regression approach
De-log data
Data range has to be transformed to a reasonable range. For a full sigmoid, -10:10 looks ok
sig <- function(z) { return( 1 / (1 + exp(-z))) }
desig <- function(x) { return(- log((1/x) - 1)) }
This might be a reasonable fast approach, but the parameter range has to be estimated. Maybe remove some outliters?
2. fit logistic regression to raw data scipy.optimize provides fit_curve(), which does OLS-ish fitting to a given function https://stackoverflow.com/questions/54376900/fit-sigmoid-curve-in-python
I expect this to be rather slow.
3. Use the sklearn logistic_regression classifyer and access raw fit data The logistic_regression is implemented as a classifyer, maybe it’s possible to use its regression fit results. Implementation should be pretty fast.
- pynibs.regression.regression.nl_hdf5(elm_idx_list=None, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, zap_idx=None, n_cpu=4, con=None, n_refit=50, return_fits=False, score_type='R2', verbose=False, pool=None, refit_discontinuities=True)¶
Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise r2 score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file.
- Parameters:
elm_idx_list (np.ndarray of int) – List containing the element indices the fit is performed for
fn_reg_hdf5 (str) – Filename (incl. path) containing the precomputed E-MEP dataframes
qoi_path_hdf5 (Union[str, list[str]]) – Path in .hdf5 file to dataset of electric field qoi e.g.: [“E”, “E_norm”, “E_tan”]
e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix
mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation
zap_idx (np.array [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)
fun (pynibs.exp.Mep function) – A function of pynibs.exp.Mep (exp0, sigmoid)
n_cpu (int) – Number of threads
con (np.ndarray of float [n_roi, 3 or 4], default: None) – Connectivity matrix of ROI (needed in case of refit because of discontinuity check)
n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.
return_fits (bool, default: False) – Return fit objects containing the parameter estimates
score_type (str, default: "R2") – Error measure of fit: * “R2”: R2 score (Model variance / Total variance); linear fits: [0, 1], 1 … perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm); [-Inf, 1], 1 … perfect fit
verbose (bool, default: False) – Plot output messages
pool (multiprocessing.Pool()) – pool instance to use.
refit_discontinuities (bool, default: True) – Run refit for discontinuous elements at the end
- Returns:
r2 – R2 for each element in elm_idx_list
- Return type:
np.ndarray of float [n_roi, n_qoi]
- pynibs.regression.regression.nl_hdf5_single_core(zap_idx, elm_idx_list, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, con=None, n_refit=50, return_fits=False, constants=None, verbose=False, seed=None, rem_bad_elms=True, return_e_field_stats=True)¶
Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise r2 score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file.
- Parameters:
elm_idx_list (np.ndarray of int) – List containing the element indices the fit is performed for
fn_reg_hdf5 (str) – Filename (incl. path) containing the precomputed E-MEP dataframes
qoi_path_hdf5 (str or list of str) – Path in .hdf5 file to dataset of electric field qoi e.g.: [“E”, “E_norm”, “E_tan”]
e_matrix (np.ndarray of float [n_zaps x n_ele]) – Electric field matrix
mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation
zap_idx (np.array [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)
fun (function object) – A function of pynibs.exp.Mep (exp0, sigmoid)
con (np.ndarray of float [n_roi, 3 or 4], default: None) – Connectivity matrix of ROI (needed in case of refit because of discontinuity check)
n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.
return_fits (bool, default: False) – Return fit objects containing the parameter estimates
constants (dict of <string>:<num>, default: None) – key:value pair of model parameters not to optimize.
verbose (bool, default: False) – Plot output messages
seed (int, default: None) – Seed to use.
rem_bad_elms (bool, default: True) – Remove elements based on a fast linear regression slope estimation
return_e_field_stats (bool, default: True) – Return some stats on the efield variance
- Returns:
r2 – R2 for each element in elm_idx_list
- Return type:
np.ndarray of float [n_roi, n_qoi]
- pynibs.regression.regression.nl_hdf5_single_core_write(i, elm_idx_list, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, con=None, n_refit=50, return_fits=False, constants=None, verbose=False, seed=None, stepdown=False, score_type='R2', return_progress=False, geo=None)¶
- pynibs.regression.regression.regress_data(e_matrix, mep, elm_idx_list=None, element_list=None, fun=<function sigmoid4>, n_cpu=4, con=None, n_refit=50, zap_idx=None, return_fits=False, score_type='R2', verbose=False, pool=None, refit_discontinuities=True, select_signed_data=False, mp_context='fork', **kwargs)¶
Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise R2 score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file.
- Parameters:
e_matrix (np.ndarray of float) – (n_zaps, n_ele) Electric field matrix
mep (np.ndarray of float) – (n_zaps,) Motor evoked potential for each stimulation
elm_idx_list (np.ndarray of int or list of int) – (n_zaps,) List containing the element indices the fit is performed for
element_list (list of Element object instances, optional) – [n_ele] pynibs.Element objects ot skip initialization here.
fun (pynibs.exp.Mep, default: pynibs.sigmoid4) – A pynibs.exp.Mep function (exp0, sigmoid, sigmoid4, …)
n_cpu (int) – Number of threads, if n_cpu=1 no parallel pool will be opened and all calculations are done in serial
con (np.ndarray of float, optional) – (n_ele, 3 or 4) Connectivity matrix of ROI. Needed in case of refit for discontinuity checks
n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.
zap_idx (np.ndarray of int or list of int, optional) – Which e/mep pairs to use.
return_fits (bool, optional) – Return fit objects containing the parameter estimates
score_type (str, default: "R2") – Error measure of fit: * “R2”: R2 score (Model variance / Total variance); linear fits: [0, 1], 1 … perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm); [-Inf, 1], 1 … perfect fit * “rho”: Spearman correlation coefficient [-1, 1]; finds any monotonous correlation (0 means no correlation)
verbose (bool, default: False) – Plot output messages
pool (multiprocessing.Pool()) – pool instance to use
refit_discontinuities (bool, default: True) – Refit discontinuous elements. If True, provide _con_
mp_context (str, default: "fork") –
Controls the method the sub-processes of the multiprocessing pool (in case of n_cpu > 1) are launched. * fork: (only supported by Unix) mp processes diverge from the main process,
the entire stack, variables and other resources are copied over. From the docs: “The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.”
spawn: (supported by Window and Unix) mp processes are launched in an entirely new Python interpreter as separate processes. Variables are copied other resources are freshly instantiated. From the docs: “In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.”
**kwargs – Passed on to pynibs.Element() to set fit parameters.
- Returns:
score (np.ndarray of float (n_roi, n_qoi)) – Score for each element.
best_values (list of dict (n_ele)) – List of parameter fits, only return if
return_fits=True
.
- pynibs.regression.regression.ridge_from_hdf5(elm_idx_list, fn_reg_hdf5, qoi_path_hdf5, zap_idx=None)¶
Mass-univariate ridge regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise sklearn.metrics.regression.r2_score is returned. The function reads the precomputed array of E-MEP data from an .hdf5 file. Always uses all cores of a machine!
- elm_idx_listlist of int
List containing the element indices the fit is performed for
- fn_hdf5str
Filename (incl. path) containing the precomputed E-MEP dataframes
- qoi_path_hdf5str
Path in .hdf5 file to dataset of electric field qoi
- zap_idxnp.ndarray [n_zaps], default: None
Indices of zaps the congruence factor is calculated with (default: all)
- Returns:
r2 – R^2 for each element in elm_idx_list
- Return type:
np.ndarray of float [n_roi, n_datasets]
- pynibs.regression.regression.sing_elm_fitted(elm_idx_list, mep_lst, mep_params, e, alpha=1000, n_samples=100)¶
Mass-univariate ridge regressions on fitted MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise sklearn.metrics.regression.r2_score is returned.
- elm_idx_list: np.ndarray [chunksize]
List of element indices, the congruence factor is computed for
- mep_lst: list of Mep object instances [n_cond]
List of fitted Mep object instances for all conditions (see exp.py for more information of Mep class)
- mep_params: np.ndarray of float [n_mep_params_total]
List of all mep parameters of curve fits used to calculate the MEP (accumulated into one array) (e.g.: [mep_#1_para_#1, mep_#1_para_#2, mep_#1_para_#3, mep_#2_para_#1, mep_#2_para_#1, …])
- e: np.ndarray of float with e.shape == (n_elm, n_cond, n_qoi)
array of the electric field to compute the r2 factor for, e.g. (e_mag, e_norm, e_tan)
- n_samplesint, default=100
Number of data points to generate discrete mep and e curves
- Returns:
r2 – R^2 for each element in elm_idx_list
- Return type:
np.ndarray of float [n_roi, n_datasets]
- pynibs.regression.regression.sing_elm_raw(elm_idx_list, mep_lst, mep_params, e, alpha=1000)¶
Mass-univariate ridge regressions on raw MEP_{AMP} ~ E. That is, for each element in elm_idx_list, it’s E (mag | norm | tan) for each zap regressed on the raw MEP amplitude. An element wise sklearn.metrics.regression.r2_score is returned.
- elm_idx_list: np.ndarray [chunksize]
List of element indices, the congruence factor is computed for
- mep: list of Mep object instances [n_cond]
List of fitted Mep object instances for all conditions (see exp.py for more information of Mep class)
- mep_params: np.ndarray of float [n_mep_params_total]
List of all mep parameters of curve fits used to calculate the MEP (accumulated into one array) (e.g.: [mep_#1_para_#1, mep_#1_para_#2, mep_#1_para_#3, mep_#2_para_#1, mep_#2_para_#1, …])
- e: np.ndarray of float with e.shape == (n_elm, n_cond, n_qoi)
array of the electric field to compute the r2 factor for, e.g. (e_mag, e_norm, e_tan)
- Returns:
r2 – R^2 for each element in elm_idx_list
- Return type:
np.ndarray of float [n_roi, n_datasets]
- pynibs.regression.regression.single_fit(x, y, fun)¶
Performs a single fit and returns fit object.
- pynibs.regression.regression.stepdown_approach(zap_idx, elm_idx_list, fn_reg_hdf5=None, qoi_path_hdf5=None, e_matrix=None, mep=None, fun=<function sigmoid4>, con=None, n_refit=50, return_fits=False, constants=None, verbose=False, seed=None, rem_bad_elms=True, return_e_field_stats=True, score_type='R2', return_progress=False, smooth_data=True, geo=None)¶
Mass-univariate nonlinear regressions on raw MEP_{AMP} ~ E in a stepdown manner to speed up computation.
Initially, one set of fits is done for the complete dataset. Afterwards, the best 1% of the elements are used as initial fitting parameters for their neighboring elements. Then, neighboring elements are fitted accordingly and iterativley. Finally, discontinuous elements are refitted until a smooth map is found or n_refit is hit.
Can be sped up with rem_bad_elms that computes a fast linear fit to identify elements with a negative slope.
The function reads the precomputed array of E-MEP data from an .hdf5 file.
- Parameters:
elm_idx_list (np.ndarray of int) – List containing the element indices the fit is performed for
fn_reg_hdf5 (str) – Filename (incl. path) containing the precomputed E-MEP dataframes
qoi_path_hdf5 (str or list of str) – Path in .hdf5 file to dataset of electric field qoi e.g.: [“E”, “E_norm”, “E_tan”]
e_matrix (np.ndarray of float [n_zaps x n_ele] | [n_zapidx x n_ele]) – Electric field matrix
mep (np.ndarray of float [n_zaps]) – Motor evoked potentials for every stimulation
zap_idx (np.array [n_zaps], default: None) – Indices of zaps the congruence factor is calculated with (default: all)
fun (function object) – A function of pynibs.exp.Mep (exp0, sigmoid)
con (np.ndarray of float [n_elm_roi, 3 or 4], default: None) – Connectivity matrix of ROI (needed in case of refit because of discontinuity check)
n_refit (int, default: 50) – Maximum number of refits of zero elements. No refit is applied in case of n_refit = 0.
return_fits (bool, default: False) – Return fit objects containing the parameter estimates
constants (dict of <string>:<num>, default: None) – key:value pair of model parameters not to optimize.
verbose (bool, default: False) – Plot output messages
seed (int, default: None) – Seed to use.
rem_bad_elms (bool, default: True) – Remove elements based on a fast linear regression slope estimation
return_e_field_stats (bool, default: True) – Return some stats on the efield variance
score_type (str, default: "R2") – Error measure of fit: * “R2”: R2 score (Model variance / Total variance); linear fits: [0, 1], 1 … perfect fit * “SR”: Relative standard error of regression (1 - Error 2-norm / Data 2-norm); [-Inf, 1], 1 … perfect fit * “rho”: Spearman correlation coefficient [-1, 1]; finds any monotonous correlation (0 means no correlation)
return_progress (bool, default: False) – Return c maps for all steps to allow visualization over e-fitting over timesteps.
smooth_data (bool, default: False) – Smooth c-map as final step.
geo (object, optional) – Geometriy data.
- Returns:
- r2np.ndarray of float [n_roi, n_qoi]
R2 for each element in elm_idx_list
- best_values: list of dict, optional
fit information
- statsdict, optional
- ’mc’: float, optional
Mutual coherence for e fields
- ’sv_rat’float, optional
SVD singular value ratio
progress : cmaps for each step
- Return type:
- pynibs.regression.regression.workhorse_element_init(ele_id, e_matrix, mep, fun, score_type, select_signed_data, constants, **kwargs)¶
Workhorse to initialize Elements.
- pynibs.regression.regression.workhorse_element_run_fit(element, max_nfev=10)¶
Workhorse to run single Element fit. If status is False, the element will not be fitted.
- pynibs.regression.regression.write_regression_hdf5(fn_exp_hdf5, fn_reg_hdf5, qoi_path_hdf5, qoi_phys, e_results_folder, qoi_e, roi_idx, conds_exclude)¶
Reads the stimulation intensities from the experiment.hdf5 file. Reads the qoi from the experiment.hdf5 file. Reads the electric fields from the electric field folder. Weights the electric field voxel wise with the respective intensities and writes an .hdf5 file containing the preprocessed data (pandas dataframe).
- fn_exp_hdf5: str
Filename of the experiment.hdf5 file
- fn_reg_hdf5: str
Filename of output regression.hdf5 file
- qoi_path_hdf5: str
Path in experiment.hdf5 file pointing to the pandas dataframe containing the qoi (e.g.: “phys_data/postproc/EMG”)
- qoi: str
Name of QOI the congruence factor is calculated with (e.g.: “p2p”)
- fn_e_results: str
Folder containing the electric fields (e.g.: “/data/pt_01756/probands/13061.30/results/electric_field/1”)
- qoi_e: str or list of str
Quantities of the electric field used to calculate the congruence factor (e.g. [“E”, “E_norm”, “E_tan”] Has to be included in e.hdf5 -> e.g.: “data/midlayer/roi_surface/1/E”
- roi_idx: int
ROI index
- conds_exclude: str or list of str
Conditions to exclude
- Returns:
<File> – File containing the intensity (current) scaled E-fields of the conditions in the ROI. Saved in datasets with the same name as qoi_e [“E”, “E_norm”, “E_tan”]
- Return type:
.hdf5 file