yahist package¶
Submodules¶
yahist.fit module¶
-
yahist.fit.
calculate_hessian
(func, x0, epsilon=1e-05)[source]¶ # Taken almost verbatim from https://gist.github.com/jgomezdans/3144636 A numerical approximation to the Hessian matrix of cost function at location x0
-
yahist.fit.
curve_fit_wrapper
(func, xdata, ydata, sigma=None, absolute_sigma=True, likelihood=False, **kwargs)[source]¶ Wrapper around scipy.optimize.curve_fit. Initial parameters (p0) can be set in the function definition with defaults for kwargs (e.g., func = lambda x,a=1.,b=2.: x+a+b, will feed p0 = [1.,2.] to curve_fit)
-
yahist.fit.
expr_to_lambda
(expr)[source]¶ - Converts a string expression like
“a+b*np.exp(-c*x+math.pi)”
- into a lambda function with 1 variable and N parameters,
lambda x,a,b,c: “a+b*np.exp(-c*x+math.pi)”
x is assumed to be the main variable. Very simple logic that ignores things like foo.bar or foo( from being considered a parameter.
- Parameters
expr (str) –
- Returns
- Return type
callable/lambda
-
yahist.fit.
fit_hist
(func, hist, nsamples=500, extent=None, ax=None, draw=True, color='red', legend=True, label='fit $\\pm$1$\\sigma$', band_style='filled', likelihood=False, curve_fit_kwargs={})[source]¶ Fits a function to a histogram via scipy.optimize.curve_fit, calculating a 1-sigma band, and optionally plotting it. Note that this does not support asymmetric errors. It will symmetrize such errors prior to fitting. Empty bins are excluded from the fit.
- Parameters
func (function taking x data as the first argument, followed by parameters, or a string) –
hist (Hist1D) –
nsamples (int, default 500) – number of samples/bootstraps for calculating error bands
ax (matplotlib AxesSubplot object, default None) –
band_style (None/str, default None) – if not None, compute and display uncertainty band. Possible strings are “filled”, “dashed”, “dotted”, “dashdot”, “solid”
draw (bool, default True) – draw to a specified or pre-existing AxesSubplot object
color (str, default "red") – color of fit line and error band
curve_fit_kwargs (dict) – dict of extra kwargs to pass to scipy.optimize.curve_fit
extent (2-tuple, default None) – if 2-tuple, these are used with Hist1D.restrict() to fit only a subset of the x-axis (but still draw the full range)
label (str, default r"fit $pm$1$sigma$") – legend entry label. Parameters will be appended unless this is empty.
legend (bool, default True) – if True and the histogram has a label, draw the legend
- Returns
parameter names, values, errors (sqrt of diagonal of the cov. matrix)
chi2, ndof of fit
a Hist1D object containing the fit
- Return type
dict of
Example
>>> h = Hist1D(np.random.random(1000), bins="30,0,1.5") >>> h.plot(show_errors=True, color="k") >>> res = fit_hist(lambda x,a,b: a+b*x, h) # or fit_hist("a+b*x", h) >>> print(res["parnames"],res["parvalues"],res["parerrors"])
yahist.hist1d module¶
-
class
yahist.hist1d.
Hist1D
(obj=[], **kwargs)[source]¶ Bases:
object
Constructs a Hist1D object from a variety of inputs
- Parameters
obj (a list/array of numbers to histogram, or another Hist1D object) –
kwargs –
- binslist/array of bin edges, number of bins, string, or “auto”, default “auto”
Follows usage for np.histogramd, with addition of string specification
- rangelist/array of axis ranges, default None
Follows usage for np.histogram
- weightslist/array of weights, default None
Follows usage for np.histogram
- threadsint, default 1
Number of threads to use for histogramming.
- overflowbool, default True
Include overflow counts in outermost bins
- metadatadict, default {}
Attach arbitrary extra data to this object
- Returns
- Return type
Examples
>>> x = np.random.normal(0, 1, 1000) >>> Hist1D(x, bins=np.linspace(-5,5,11)) >>> Hist1D(x, bins="10,-5,5") >>> Hist1D(x, bins="10,-5,5,20,-3,3") >>> h1 = Hist1D(label="foo", color="C0") >>> h1 = Hist1D(h1, label="bar", color="C1") >>> Hist1D([], metadata=dict(foo=1))
-
property
bin_centers
¶ Returns the midpoints of bin edges.
- Returns
Bin centers
- Return type
array
-
property
bin_widths
¶ Returns the widths of bins.
- Returns
Bin widths
- Return type
array
-
property
counts
¶
-
cumulative
(forward=True)[source]¶ Turns Hist object into one with cumulative counts.
- Parameters
forward (bool, default True) – If true, sum the x-axis from low to high, otherwise high to low
- Returns
- Return type
-
property
dim
¶ Returns the number of dimensions. Hist1D returns 1, Hist2D returns 2
- Returns
Number of dimensions
- Return type
int
-
divide
(**kw)¶
-
property
edges
¶
-
property
errors
¶
-
property
errors_down
¶
-
property
errors_up
¶
-
fill
(obj, weights=None)[source]¶ Fills a Hist1D/Hist2D in place.
- Parameters
obj – Object to fill, with same definition as class construction
weights (list/array of weights, default None) – See class constructor
Example
>>> h = Hist1D(bins="10,0,10", label="test") >>> h.fill([1,2,3,4]) >>> h.fill([0,1,2]) >>> h.median() 2.5
-
fit
(func, **kwargs)[source]¶ Fits a function to a histogram via scipy.optimize.curve_fit, calculating a 1-sigma band, and optionally plotting it. Note that this does not support asymmetric errors. It will symmetrize such errors prior to fitting. Empty bins are excluded from the fit.
- Parameters
func (function taking x data as the first argument, followed by parameters, or a string) –
hist (Hist1D) –
nsamples (int, default 500) – number of samples/bootstraps for calculating error bands
ax (matplotlib AxesSubplot object, default None) –
band_style (None/str, default None) – if not None, compute and display uncertainty band. Possible strings are “filled”, “dashed”, “dotted”, “dashdot”, “solid”
draw (bool, default True) – draw to a specified or pre-existing AxesSubplot object
color (str, default "red") – color of fit line and error band
curve_fit_kwargs (dict) – dict of extra kwargs to pass to scipy.optimize.curve_fit
extent (2-tuple, default None) – if 2-tuple, these are used with Hist1D.restrict() to fit only a subset of the x-axis (but still draw the full range)
label (str, default r"fit $pm$1$sigma$") – legend entry label. Parameters will be appended unless this is empty.
legend (bool, default True) – if True and the histogram has a label, draw the legend
- Returns
parameter names, values, errors (sqrt of diagonal of the cov. matrix)
chi2, ndof of fit
a Hist1D object containing the fit
- Return type
dict of
Example
>>> h = Hist1D(np.random.random(1000), bins="30,0,1.5") >>> h.plot(show_errors=True, color="k") >>> res = fit_hist(lambda x,a,b: a+b*x, h) # or fit_hist("a+b*x", h) >>> print(res["parnames"],res["parvalues"],res["parerrors"])
-
classmethod
from_bincounts
(counts, bins=None, errors=None, **kwargs)[source]¶ Creates histogram object from array of histogrammed counts, edges/bins, and optionally errors.
- Parameters
counts (array) – Array of bin counts
bins (array, default None) – Array of bin edges. If not specified for Hist1D, uses bins = np.arange(len(counts)+1).
errors (array, default None) – Array of bin errors (optional)
**kwargs – Parameters to be passed to Hist1D/Hist2D constructor.
- Returns
- Return type
Hist
-
classmethod
from_json
(obj)[source]¶ Converts serialized json to histogram object.
- Parameters
obj (str) – json-serialized object from self.to_json() or file path
- Returns
- Return type
Hist
-
classmethod
from_random
(which='norm', params=[0.0, 1.0], size=100000.0, random_state=None, **kwargs)[source]¶ Creates histogram object from random values of a given distribution within scipy.stats
- Parameters
which (str, default "norm") – Distribution within scipy.stats
params (list/array, default [0, 1]) – Parameters to distribution
size (int/float, 1e5) – Number of random values to sample/fill histogram
random_state (int, default None) –
- Returns
- Return type
Hist
-
html_table
(suppress=True)[source]¶ Return HTML table tag with bin contents (counts and errors) compactly formatted. Only the four leftmost and rightmost bins are shown, while the rest are hidden.
- Parameters
suppress (bool, default True) – if True, hide middle bins/rows
- Returns
- Return type
str
-
property
integral
¶ Returns the integral of the histogram (sum of counts).
- Returns
Sum of counts
- Return type
float
-
property
integral_error
¶ Returns the error of the integral of the histogram
- Returns
Error on integral
- Return type
float
-
lookup
(x)[source]¶ Convert a specified list of x-values into corresponding bin counts via np.digitize
- Parameters
x (array of x-values, or single x-value) –
- Returns
- Return type
array
-
median
()[source]¶ Returns the bin center closest to the median of the histogram.
- Returns
median
- Return type
float
-
property
metadata
¶
-
mode
()[source]¶ Returns mode (bin center for bin with largest value). If multiple bins are tied, only the first/leftmost is returned.
- Returns
mode
- Return type
float
-
property
nbins
¶ Returns the number of bins
- Returns
Number of bins
- Return type
int
-
property
nbytes
¶ Returns sum of nbytes of underlying numpy arrays
- Returns
Number of bytes of underlying numpy arrays
- Return type
int
-
normalize
(density=False)[source]¶ Divides counts of each bin by the sum of the total counts. If density=True, also divide by bin widths.
- Returns
- Return type
Hist
-
plot
(ax=None, histtype='step', legend=True, counts=False, errors=False, fmt='o', label=None, color=None, counts_formatter=<built-in method format of str object>, counts_fontsize=10, interactive=False, **kwargs)[source]¶ Plot this histogram object using matplotlib’s hist function, or errorbar (depending on the value of the errors argument).
- Parameters
ax (matplotlib AxesSubplot object, default None) – matplotlib AxesSubplot object. Created if None.
color (str, default None) – If None, uses default matplotlib color cycler
counts – If True, show text labels for counts (and/or errors). See counts_formatter and counts_fontsize.
False (bool) – If True, show text labels for counts (and/or errors). See counts_formatter and counts_fontsize.
counts_formatter (callable, default “{:3g}”.format) – Two-parameter function used to format count and error labels. Thus, if a second placeholder is specified (e.g., “{:3g} +- {:3g}”.format), the bin error can be shown as well.
counts_fontsize – Font size of count labels
errors – If True, plot markers with error bars (ax.errorbar()) instead of ax.hist().
False – If True, plot markers with error bars (ax.errorbar()) instead of ax.hist().
fmt (str, default "o") – fmt kwarg used for matplotlib plotting
label (str, default None) – Label for legend entry
interactive (bool, default False) – Use plotly to make an interactive plot. See Hist1D.plot_plotly().
legend (bool, default True) – If True and the histogram has a label, draw the legend
**kwargs – Parameters to be passed to matplotlib or errorbar (if errors=True) hist (otherwise) function.
- Returns
- Return type
matplotlib AxesSubplot object
-
plot_plotly
(fig=None, color=None, errors=False, log=False, label=None, flipxy=False, alpha=1, stack=False, **kwargs)[source]¶
-
quantile
(q)[source]¶ Returns the bin center corresponding to the quantile(s) q. Similar to np.quantile.
- Parameters
q (float, or array of floats) – quantile between 0 and 1
- Returns
- Return type
float, or array of floats
-
rebin
(nrebin)[source]¶ Combines adjacent bins by summing contents. The total number of bins for each axis must be exactly divisible by nrebin.
- Parameters
nrebin (int) – Number of adjacent bins to combine into one bin.
- Returns
- Return type
-
restrict
(low=None, high=None)[source]¶ Restricts to a contiguous subset of bins with bin center values within [low, high]. If low/high is None, there is no lower/upper bound
- Parameters
low (float (default None)) – Lower x center to keep
high (float (default None)) – Highest x center to keep
- Returns
- Return type
-
sample
(size=100000.0)[source]¶ Returns an array of random samples according to a discrete pdf from this histogram.
- Parameters
size (int/float, 1e5) – Number of random values to sample
- Returns
- Return type
array
-
std
()[source]¶ Returns the standard deviation of the histogram
- Returns
standard deviation of histogram (or, RMS)
- Return type
float
-
svg
(**kwargs)[source]¶ Return HTML svg tag with Matplotlib-rendered svg.
- Parameters
**kwargs – Parameters to be passed to self.plot() function.
- Returns
- Return type
str
-
svg_fast
(height=250, aspectratio=1.4, padding=0.02, strokewidth=1, color=None, bottom=True, frame=True)[source]¶ Return HTML svg tag with bare-bones version of histogram (no ticks, labels).
- Parameters
height (int, default 250) – Height of plot in pixels
padding (float, default 0.025) – Fraction of height or width to keep between edges of plot and svg view size
aspectratio (float, default 1.4) – Aspect ratio of plot
strokewidth (float, default 1) – Width of strokes
bottom (bool, default True) – Draw line at the bottom
color (str, default None",) – Stroke color and fill color (with 15% opacity) If color is in the histogram metadata, it will take precedence.
frame (bool, default True) – Draw frame/border
- Returns
- Return type
str
yahist.hist2d module¶
-
class
yahist.hist2d.
Hist2D
(obj=[], **kwargs)[source]¶ Bases:
yahist.hist1d.Hist1D
Constructs a Hist2D object from a variety of inputs
- Parameters
obj (a 2D array of numbers to histogram, another Hist2D object,) – or a tuple of x and y values
kwargs –
- binslist/array of bin edges, number of bins, or string, default 10
Follows usage for np.histogram2d, with addition of string specification
- rangelist/array of axis ranges, default None
Follows usage for np.histogram2d
- weightslist/array of weights, default None
Follows usage for np.histogram2d
- threadsint, default 0
Number of threads to use for histogramming. If 0, autodetect (within boost_histogram)
- overflowbool, default True
Include overflow counts in outermost bins
- metadatadict, default {}
Attach arbitrary extra data to this object
- Returns
- Return type
Examples
>>> x = np.random.normal(0, 1, 1000) >>> y = np.random.normal(0, 1, 1000) >>> Hist2D(np.c_[x,y], bins=np.linspace(-5,5,11)) >>> Hist2D((x,y), bins="10,-5,5") >>> Hist2D((x,y), bins="10,-5,5,20,-3,3") >>> df = pd.DataFrame(dict(x=x,y=y)) >>> Hist2D(df[["x","y"]], bins="10,-5,5", threads=4)
-
property
bin_centers
¶ Returns the centers of bins.
- Returns
Bin centers
- Return type
array
-
property
bin_widths
¶ Returns the widths of bins.
- Returns
Bin widths
- Return type
array
-
canvas
(height=250, aspectratio=1.4)[source]¶ Return HTML5 canvas tag similar to self.svg().
- Parameters
height (int, default 250) – Height of plot in pixels
aspectratio (float, default 1.4) – Aspect ratio of plot
- Returns
- Return type
str
-
correlation
()[source]¶ Returns the correlation factor between the x and y axes, matching the routine in https://root.cern.ch/doc/master/TH2_8cxx_source.html#l01044
- Returns
- Return type
float
-
cumulative
(forwardx=True, forwardy=True)[source]¶ Turns Hist object into one with cumulative counts.
- Parameters
forwardx (bool, default True) – If true, sum the x-axis from low to high, otherwise high to low If None, do not sum along this axis.
forwardy (bool, default True) – If true, sum the y-axis from low to high, otherwise high to low If None, do not sum along this axis.
- Returns
- Return type
-
lookup
(x, y)[source]¶ Convert a specified list of x-values and y-values into corresponding bin counts via np.digitize
- Parameters
x (array of y-values, or single y-value) –
y (array of y-values, or single y-value) –
- Returns
- Return type
array
-
property
nbins
¶ Returns the number of bins
- Returns
Number of bins
- Return type
int
-
plot
(ax=None, fig=None, colorbar=True, hide_empty=True, counts=False, counts_formatter=<built-in method format of str object>, counts_fontsize=12, logz=False, equidistant='', interactive=False, **kwargs)[source]¶ Plot this histogram object using matplotlib’s hist function, or errorbar.
- Parameters
ax (matplotlib AxesSubplot object, default None) – matplotlib AxesSubplot object. Created if None.
fig (matplotlib Figure object, default None) – matplotlib Figure object. Created if None.
counts – If True, show text labels for counts (and/or errors). See counts_formatter and counts_fontsize.
counts_formatter (callable, default “{:3g}”.format) – Two-parameter function used to format count and error labels. Thus, if a second placeholder is specified (e.g., “{:3g}n$pm$ {:3g}”.format), the bin error can be shown as well.
counts_fontsize – Font size of count labels
colorbar (bool, default True) – Show colorbar
equidistant (str, default "") – If not an empty string, make bins equally-spaced in the x-axis (equidistant=”x”), y-axis (“y”), or both (“xy”).
hide_empty (bool, default True) – Don’t draw empty bins (content==0)
interactive (bool, default False) – Use plotly to make an interactive plot
logz (bool, default False) – Use logscale for z-axis
**kwargs – Parameters to be passed to matplotlib pcolorfast/pcolormesh function.
- Returns
- Return type
2-tuple of (pcolorfast/pcolormesh output, matplotlib AxesSubplot object)
-
profile
(axis='x')[source]¶ Returns the x-profile of the 2d histogram by calculating the weighted mean over the y-axis.
- Parameters
axis (str/int (default "x")) – if “x” or 0, return the x-profile (mean over y-axis) if “y” or 1, return the y-profile (mean over x-axis)
- Returns
- Return type
-
projection
(axis='x')[source]¶ Returns the x/y-projection of the 2d histogram by summing over the y/x-axis.
- Parameters
axis (str/int (default "x")) – if “x” or 0, return the x-projection (summing over y-axis) if “y” or 1, return the y-projection (summing over x-axis)
- Returns
- Return type
-
rebin
(nrebinx, nrebiny=None)[source]¶ Combines adjacent bins by summing contents. The total number of bins for the x-axis (y-axis) must be exactly divisible by nrebinx (nrebiny). Based on the method in https://stackoverflow.com/questions/44527579/whats-the-best-way-to-downsample-a-numpy-array.
- Parameters
nrebinx (int) – Number of adjacent x-axis bins to combine into one bin.
nrebiny (int) – Number of adjacent y-axis bins to combine into one bin.
- Returns
- Return type
-
restrict
(xlow=None, xhigh=None, ylow=None, yhigh=None)[source]¶ Restricts to a contiguous subset of bins with bin center values within [[xlow, xhigh], [ylow,yhigh]]. If any limit is None, the specified direction will be unbounded.
- Parameters
xlow (float (default None)) – Lower x center to keep
xhigh (float (default None)) – Highest x center to keep
ylow (float (default None)) – Lower y center to keep
yhigh (float (default None)) – Highest y center to keep
- Returns
- Return type
-
sample
(size=100000.0)[source]¶ Returns a 2-column array of random samples according to a discrete pdf from this histogram.
>>> h1 = Hist2D.from_random() >>> h2 = Hist2D(h1.sample(100), bins=h1.edges)
- Parameters
size (int/float, 1e5) – Number of random values to sample
- Returns
- Return type
array
-
smooth
(ntimes=3, window=3)[source]¶ Returns a smoothed Hist2D via convolution with three kernels used by https://root.cern.ch/doc/master/TH2_8cxx_source.html#l02600
- Parameters
ntimes (int (default 3)) – Number of times to repeat smoothing
window (int (default 3)) – Kernel size (1, 3, 5 supported)
- Returns
- Return type
-
svg
(ticks=True, **kwargs)[source]¶ Return HTML svg tag with Matplotlib-rendered svg.
- Parameters
ticks (bool, default True) – Show x/y ticks and labels
**kwargs – Parameters to be passed to self.plot() function.
- Returns
- Return type
str
-
svg_fast
(height=250, aspectratio=1.4, interactive=False)[source]¶ Return HTML svg tag with bare-bones version of histogram (no ticks, labels).
- Parameters
height (int, default 250) – Height of plot in pixels
aspectratio (float, default 1.4) – Aspect ratio of plot
interactive (bool, default False) – Whether to display bin contents on mouse hover.
- Returns
- Return type
str
yahist.utils module¶
-
yahist.utils.
binomial_obs_z
(data, bkg, bkgerr)[source]¶ Calculate pull values according to https://root.cern.ch/doc/v606/NumberCountingUtils_8cxx_source.html#l00137 The scipy version is vectorized, so you can feed in arrays If gaussian_fallback return a simple gaussian pull when data count is 0, otherwise both ROOT and scipy will return inf/nan.
-
yahist.utils.
clopper_pearson_error
(passed, total, level=0.6827)[source]¶ matching TEfficiency::ClopperPearson(), >>> ROOT.TEfficiency.ClopperPearson(total, passed, level, is_upper)
-
yahist.utils.
plot_stack
(hists, **kwargs)[source]¶ Plots a list of Hist1D objects as a stack
- Parameters
hists (list of Hist1D objects) –
kwargs (passed to Hist1D.plot()) –
-
yahist.utils.
poisson_errors
(obs, alpha=0.3173)[source]¶ Return poisson low and high values for a series of data observations