yahist package

Submodules

yahist.fit module

yahist.fit.calculate_hessian(func, x0, epsilon=1e-05)[source]

# Taken almost verbatim from https://gist.github.com/jgomezdans/3144636 A numerical approximation to the Hessian matrix of cost function at location x0

yahist.fit.curve_fit_wrapper(func, xdata, ydata, sigma=None, absolute_sigma=True, likelihood=False, **kwargs)[source]

Wrapper around scipy.optimize.curve_fit. Initial parameters (p0) can be set in the function definition with defaults for kwargs (e.g., func = lambda x,a=1.,b=2.: x+a+b, will feed p0 = [1.,2.] to curve_fit)

yahist.fit.expr_to_lambda(expr)[source]
Converts a string expression like

“a+b*np.exp(-c*x+math.pi)”

into a lambda function with 1 variable and N parameters,

lambda x,a,b,c: “a+b*np.exp(-c*x+math.pi)”

x is assumed to be the main variable. Very simple logic that ignores things like foo.bar or foo( from being considered a parameter.

Parameters

expr (str) –

Returns

Return type

callable/lambda

yahist.fit.fit_hist(func, hist, nsamples=500, extent=None, ax=None, draw=True, color='red', legend=True, label='fit $\\pm$1$\\sigma$', band_style='filled', likelihood=False, curve_fit_kwargs={})[source]

Fits a function to a histogram via scipy.optimize.curve_fit, calculating a 1-sigma band, and optionally plotting it. Note that this does not support asymmetric errors. It will symmetrize such errors prior to fitting. Empty bins are excluded from the fit.

Parameters
  • func (function taking x data as the first argument, followed by parameters, or a string) –

  • hist (Hist1D) –

  • nsamples (int, default 500) – number of samples/bootstraps for calculating error bands

  • ax (matplotlib AxesSubplot object, default None) –

  • band_style (None/str, default None) – if not None, compute and display uncertainty band. Possible strings are “filled”, “dashed”, “dotted”, “dashdot”, “solid”

  • draw (bool, default True) – draw to a specified or pre-existing AxesSubplot object

  • color (str, default "red") – color of fit line and error band

  • curve_fit_kwargs (dict) – dict of extra kwargs to pass to scipy.optimize.curve_fit

  • extent (2-tuple, default None) – if 2-tuple, these are used with Hist1D.restrict() to fit only a subset of the x-axis (but still draw the full range)

  • label (str, default r"fit $pm$1$sigma$") – legend entry label. Parameters will be appended unless this is empty.

  • legend (bool, default True) – if True and the histogram has a label, draw the legend

Returns

  • parameter names, values, errors (sqrt of diagonal of the cov. matrix)

  • chi2, ndof of fit

  • a Hist1D object containing the fit

Return type

dict of

Example

>>> h = Hist1D(np.random.random(1000), bins="30,0,1.5")
>>> h.plot(show_errors=True, color="k")
>>> res = fit_hist(lambda x,a,b: a+b*x, h) # or fit_hist("a+b*x", h)
>>> print(res["parnames"],res["parvalues"],res["parerrors"])

yahist.hist1d module

class yahist.hist1d.Hist1D(obj=[], **kwargs)[source]

Bases: object

Constructs a Hist1D object from a variety of inputs

Parameters
  • obj (a list/array of numbers to histogram, or another Hist1D object) –

  • kwargs

    binslist/array of bin edges, number of bins, string, or “auto”, default “auto”

    Follows usage for np.histogramd, with addition of string specification

    rangelist/array of axis ranges, default None

    Follows usage for np.histogram

    weightslist/array of weights, default None

    Follows usage for np.histogram

    threadsint, default 1

    Number of threads to use for histogramming.

    overflowbool, default True

    Include overflow counts in outermost bins

    metadatadict, default {}

    Attach arbitrary extra data to this object

Returns

Return type

Hist1D

Examples

>>> x = np.random.normal(0, 1, 1000)
>>> Hist1D(x, bins=np.linspace(-5,5,11))
>>> Hist1D(x, bins="10,-5,5")
>>> Hist1D(x, bins="10,-5,5,20,-3,3")
>>> h1 = Hist1D(label="foo", color="C0")
>>> h1 = Hist1D(h1, label="bar", color="C1")
>>> Hist1D([], metadata=dict(foo=1))
property bin_centers

Returns the midpoints of bin edges.

Returns

Bin centers

Return type

array

property bin_widths

Returns the widths of bins.

Returns

Bin widths

Return type

array

copy()[source]
property counts
cumulative(forward=True)[source]

Turns Hist object into one with cumulative counts.

Parameters

forward (bool, default True) – If true, sum the x-axis from low to high, otherwise high to low

Returns

Return type

Hist1D

property dim

Returns the number of dimensions. Hist1D returns 1, Hist2D returns 2

Returns

Number of dimensions

Return type

int

divide(**kw)
property edges
property errors
property errors_down
property errors_up
fill(obj, weights=None)[source]

Fills a Hist1D/Hist2D in place.

Parameters
  • obj – Object to fill, with same definition as class construction

  • weights (list/array of weights, default None) – See class constructor

Example

>>> h = Hist1D(bins="10,0,10", label="test")
>>> h.fill([1,2,3,4])
>>> h.fill([0,1,2])
>>> h.median()
2.5
fit(func, **kwargs)[source]

Fits a function to a histogram via scipy.optimize.curve_fit, calculating a 1-sigma band, and optionally plotting it. Note that this does not support asymmetric errors. It will symmetrize such errors prior to fitting. Empty bins are excluded from the fit.

Parameters
  • func (function taking x data as the first argument, followed by parameters, or a string) –

  • hist (Hist1D) –

  • nsamples (int, default 500) – number of samples/bootstraps for calculating error bands

  • ax (matplotlib AxesSubplot object, default None) –

  • band_style (None/str, default None) – if not None, compute and display uncertainty band. Possible strings are “filled”, “dashed”, “dotted”, “dashdot”, “solid”

  • draw (bool, default True) – draw to a specified or pre-existing AxesSubplot object

  • color (str, default "red") – color of fit line and error band

  • curve_fit_kwargs (dict) – dict of extra kwargs to pass to scipy.optimize.curve_fit

  • extent (2-tuple, default None) – if 2-tuple, these are used with Hist1D.restrict() to fit only a subset of the x-axis (but still draw the full range)

  • label (str, default r"fit $pm$1$sigma$") – legend entry label. Parameters will be appended unless this is empty.

  • legend (bool, default True) – if True and the histogram has a label, draw the legend

Returns

  • parameter names, values, errors (sqrt of diagonal of the cov. matrix)

  • chi2, ndof of fit

  • a Hist1D object containing the fit

Return type

dict of

Example

>>> h = Hist1D(np.random.random(1000), bins="30,0,1.5")
>>> h.plot(show_errors=True, color="k")
>>> res = fit_hist(lambda x,a,b: a+b*x, h) # or fit_hist("a+b*x", h)
>>> print(res["parnames"],res["parvalues"],res["parerrors"])
classmethod from_bincounts(counts, bins=None, errors=None, **kwargs)[source]

Creates histogram object from array of histogrammed counts, edges/bins, and optionally errors.

Parameters
  • counts (array) – Array of bin counts

  • bins (array, default None) – Array of bin edges. If not specified for Hist1D, uses bins = np.arange(len(counts)+1).

  • errors (array, default None) – Array of bin errors (optional)

  • **kwargs – Parameters to be passed to Hist1D/Hist2D constructor.

Returns

Return type

Hist

classmethod from_json(obj)[source]

Converts serialized json to histogram object.

Parameters

obj (str) – json-serialized object from self.to_json() or file path

Returns

Return type

Hist

classmethod from_random(which='norm', params=[0.0, 1.0], size=100000.0, random_state=None, **kwargs)[source]

Creates histogram object from random values of a given distribution within scipy.stats

Parameters
  • which (str, default "norm") – Distribution within scipy.stats

  • params (list/array, default [0, 1]) – Parameters to distribution

  • size (int/float, 1e5) – Number of random values to sample/fill histogram

  • random_state (int, default None) –

Returns

Return type

Hist

html_table(suppress=True)[source]

Return HTML table tag with bin contents (counts and errors) compactly formatted. Only the four leftmost and rightmost bins are shown, while the rest are hidden.

Parameters

suppress (bool, default True) – if True, hide middle bins/rows

Returns

Return type

str

property integral

Returns the integral of the histogram (sum of counts).

Returns

Sum of counts

Return type

float

property integral_error

Returns the error of the integral of the histogram

Returns

Error on integral

Return type

float

lookup(x)[source]

Convert a specified list of x-values into corresponding bin counts via np.digitize

Parameters

x (array of x-values, or single x-value) –

Returns

Return type

array

mean()[source]

Returns the mean of the histogram

Returns

Mean of histogram

Return type

float

median()[source]

Returns the bin center closest to the median of the histogram.

Returns

median

Return type

float

property metadata
mode()[source]

Returns mode (bin center for bin with largest value). If multiple bins are tied, only the first/leftmost is returned.

Returns

mode

Return type

float

property nbins

Returns the number of bins

Returns

Number of bins

Return type

int

property nbytes

Returns sum of nbytes of underlying numpy arrays

Returns

Number of bytes of underlying numpy arrays

Return type

int

normalize(density=False)[source]

Divides counts of each bin by the sum of the total counts. If density=True, also divide by bin widths.

Returns

Return type

Hist

plot(ax=None, histtype='step', legend=True, counts=False, errors=False, fmt='o', label=None, color=None, counts_formatter=<built-in method format of str object>, counts_fontsize=10, interactive=False, **kwargs)[source]

Plot this histogram object using matplotlib’s hist function, or errorbar (depending on the value of the errors argument).

Parameters
  • ax (matplotlib AxesSubplot object, default None) – matplotlib AxesSubplot object. Created if None.

  • color (str, default None) – If None, uses default matplotlib color cycler

  • counts – If True, show text labels for counts (and/or errors). See counts_formatter and counts_fontsize.

  • False (bool) – If True, show text labels for counts (and/or errors). See counts_formatter and counts_fontsize.

  • counts_formatter (callable, default “{:3g}”.format) – Two-parameter function used to format count and error labels. Thus, if a second placeholder is specified (e.g., “{:3g} +- {:3g}”.format), the bin error can be shown as well.

  • counts_fontsize – Font size of count labels

  • errors – If True, plot markers with error bars (ax.errorbar()) instead of ax.hist().

  • False – If True, plot markers with error bars (ax.errorbar()) instead of ax.hist().

  • fmt (str, default "o") – fmt kwarg used for matplotlib plotting

  • label (str, default None) – Label for legend entry

  • interactive (bool, default False) – Use plotly to make an interactive plot. See Hist1D.plot_plotly().

  • legend (bool, default True) – If True and the histogram has a label, draw the legend

  • **kwargs – Parameters to be passed to matplotlib or errorbar (if errors=True) hist (otherwise) function.

Returns

Return type

matplotlib AxesSubplot object

plot_plotly(fig=None, color=None, errors=False, log=False, label=None, flipxy=False, alpha=1, stack=False, **kwargs)[source]
quantile(q)[source]

Returns the bin center corresponding to the quantile(s) q. Similar to np.quantile.

Parameters

q (float, or array of floats) – quantile between 0 and 1

Returns

Return type

float, or array of floats

rebin(nrebin)[source]

Combines adjacent bins by summing contents. The total number of bins for each axis must be exactly divisible by nrebin.

Parameters

nrebin (int) – Number of adjacent bins to combine into one bin.

Returns

Return type

Hist1D

restrict(low=None, high=None)[source]

Restricts to a contiguous subset of bins with bin center values within [low, high]. If low/high is None, there is no lower/upper bound

Parameters
  • low (float (default None)) – Lower x center to keep

  • high (float (default None)) – Highest x center to keep

Returns

Return type

Hist1D

sample(size=100000.0)[source]

Returns an array of random samples according to a discrete pdf from this histogram.

Parameters

size (int/float, 1e5) – Number of random values to sample

Returns

Return type

array

scale(factor)[source]

Alias for multiplication

Returns

Return type

Hist

std()[source]

Returns the standard deviation of the histogram

Returns

standard deviation of histogram (or, RMS)

Return type

float

svg(**kwargs)[source]

Return HTML svg tag with Matplotlib-rendered svg.

Parameters

**kwargs – Parameters to be passed to self.plot() function.

Returns

Return type

str

svg_fast(height=250, aspectratio=1.4, padding=0.02, strokewidth=1, color=None, bottom=True, frame=True)[source]

Return HTML svg tag with bare-bones version of histogram (no ticks, labels).

Parameters
  • height (int, default 250) – Height of plot in pixels

  • padding (float, default 0.025) – Fraction of height or width to keep between edges of plot and svg view size

  • aspectratio (float, default 1.4) – Aspect ratio of plot

  • strokewidth (float, default 1) – Width of strokes

  • bottom (bool, default True) – Draw line at the bottom

  • color (str, default None",) – Stroke color and fill color (with 15% opacity) If color is in the histogram metadata, it will take precedence.

  • frame (bool, default True) – Draw frame/border

Returns

Return type

str

to_json(obj=None)[source]

Returns json-serialized version of this object.

Parameters

obj (str, default None) – If specified, writes json to path instead of returning string. If the path ends with ‘.gz’, compresses with gzip.

Returns

Return type

str

to_poisson_errors(alpha=0.3173)[source]

Converts Hist object into one with asymmetric Poissonian errors, inside the errors_up and errors_down properties.

Parameters

alpha (float, default 1-0.6827) – Confidence interval for errors. 1-sigma by default.

Returns

Return type

Hist1D

yahist.hist2d module

class yahist.hist2d.Hist2D(obj=[], **kwargs)[source]

Bases: yahist.hist1d.Hist1D

Constructs a Hist2D object from a variety of inputs

Parameters
  • obj (a 2D array of numbers to histogram, another Hist2D object,) – or a tuple of x and y values

  • kwargs

    binslist/array of bin edges, number of bins, or string, default 10

    Follows usage for np.histogram2d, with addition of string specification

    rangelist/array of axis ranges, default None

    Follows usage for np.histogram2d

    weightslist/array of weights, default None

    Follows usage for np.histogram2d

    threadsint, default 0

    Number of threads to use for histogramming. If 0, autodetect (within boost_histogram)

    overflowbool, default True

    Include overflow counts in outermost bins

    metadatadict, default {}

    Attach arbitrary extra data to this object

Returns

Return type

Hist2D

Examples

>>> x = np.random.normal(0, 1, 1000)
>>> y = np.random.normal(0, 1, 1000)
>>> Hist2D(np.c_[x,y], bins=np.linspace(-5,5,11))
>>> Hist2D((x,y), bins="10,-5,5")
>>> Hist2D((x,y), bins="10,-5,5,20,-3,3")
>>> df = pd.DataFrame(dict(x=x,y=y))
>>> Hist2D(df[["x","y"]], bins="10,-5,5", threads=4)
property bin_centers

Returns the centers of bins.

Returns

Bin centers

Return type

array

property bin_widths

Returns the widths of bins.

Returns

Bin widths

Return type

array

canvas(height=250, aspectratio=1.4)[source]

Return HTML5 canvas tag similar to self.svg().

Parameters
  • height (int, default 250) – Height of plot in pixels

  • aspectratio (float, default 1.4) – Aspect ratio of plot

Returns

Return type

str

correlation()[source]

Returns the correlation factor between the x and y axes, matching the routine in https://root.cern.ch/doc/master/TH2_8cxx_source.html#l01044

Returns

Return type

float

cumulative(forwardx=True, forwardy=True)[source]

Turns Hist object into one with cumulative counts.

Parameters
  • forwardx (bool, default True) – If true, sum the x-axis from low to high, otherwise high to low If None, do not sum along this axis.

  • forwardy (bool, default True) – If true, sum the y-axis from low to high, otherwise high to low If None, do not sum along this axis.

Returns

Return type

Hist2D

html_table()[source]

Return dummy HTML table tag.

Returns

Return type

str

lookup(x, y)[source]

Convert a specified list of x-values and y-values into corresponding bin counts via np.digitize

Parameters
  • x (array of y-values, or single y-value) –

  • y (array of y-values, or single y-value) –

Returns

Return type

array

property nbins

Returns the number of bins

Returns

Number of bins

Return type

int

plot(ax=None, fig=None, colorbar=True, hide_empty=True, counts=False, counts_formatter=<built-in method format of str object>, counts_fontsize=12, logz=False, equidistant='', interactive=False, **kwargs)[source]

Plot this histogram object using matplotlib’s hist function, or errorbar.

Parameters
  • ax (matplotlib AxesSubplot object, default None) – matplotlib AxesSubplot object. Created if None.

  • fig (matplotlib Figure object, default None) – matplotlib Figure object. Created if None.

  • counts – If True, show text labels for counts (and/or errors). See counts_formatter and counts_fontsize.

  • counts_formatter (callable, default “{:3g}”.format) – Two-parameter function used to format count and error labels. Thus, if a second placeholder is specified (e.g., “{:3g}n$pm$ {:3g}”.format), the bin error can be shown as well.

  • counts_fontsize – Font size of count labels

  • colorbar (bool, default True) – Show colorbar

  • equidistant (str, default "") – If not an empty string, make bins equally-spaced in the x-axis (equidistant=”x”), y-axis (“y”), or both (“xy”).

  • hide_empty (bool, default True) – Don’t draw empty bins (content==0)

  • interactive (bool, default False) – Use plotly to make an interactive plot

  • logz (bool, default False) – Use logscale for z-axis

  • **kwargs – Parameters to be passed to matplotlib pcolorfast/pcolormesh function.

Returns

Return type

2-tuple of (pcolorfast/pcolormesh output, matplotlib AxesSubplot object)

plot_plotly(fig=None, cmap=None, logz=False, label=None, hide_empty=True, **kwargs)[source]
profile(axis='x')[source]

Returns the x-profile of the 2d histogram by calculating the weighted mean over the y-axis.

Parameters

axis (str/int (default "x")) – if “x” or 0, return the x-profile (mean over y-axis) if “y” or 1, return the y-profile (mean over x-axis)

Returns

Return type

Hist1D

projection(axis='x')[source]

Returns the x/y-projection of the 2d histogram by summing over the y/x-axis.

Parameters

axis (str/int (default "x")) – if “x” or 0, return the x-projection (summing over y-axis) if “y” or 1, return the y-projection (summing over x-axis)

Returns

Return type

Hist1D

rebin(nrebinx, nrebiny=None)[source]

Combines adjacent bins by summing contents. The total number of bins for the x-axis (y-axis) must be exactly divisible by nrebinx (nrebiny). Based on the method in https://stackoverflow.com/questions/44527579/whats-the-best-way-to-downsample-a-numpy-array.

Parameters
  • nrebinx (int) – Number of adjacent x-axis bins to combine into one bin.

  • nrebiny (int) – Number of adjacent y-axis bins to combine into one bin.

Returns

Return type

Hist2D

restrict(xlow=None, xhigh=None, ylow=None, yhigh=None)[source]

Restricts to a contiguous subset of bins with bin center values within [[xlow, xhigh], [ylow,yhigh]]. If any limit is None, the specified direction will be unbounded.

Parameters
  • xlow (float (default None)) – Lower x center to keep

  • xhigh (float (default None)) – Highest x center to keep

  • ylow (float (default None)) – Lower y center to keep

  • yhigh (float (default None)) – Highest y center to keep

Returns

Return type

Hist2D

sample(size=100000.0)[source]

Returns a 2-column array of random samples according to a discrete pdf from this histogram.

>>> h1 = Hist2D.from_random()
>>> h2 = Hist2D(h1.sample(100), bins=h1.edges)
Parameters

size (int/float, 1e5) – Number of random values to sample

Returns

Return type

array

smooth(ntimes=3, window=3)[source]

Returns a smoothed Hist2D via convolution with three kernels used by https://root.cern.ch/doc/master/TH2_8cxx_source.html#l02600

Parameters
  • ntimes (int (default 3)) – Number of times to repeat smoothing

  • window (int (default 3)) – Kernel size (1, 3, 5 supported)

Returns

Return type

Hist2D

svg(ticks=True, **kwargs)[source]

Return HTML svg tag with Matplotlib-rendered svg.

Parameters
  • ticks (bool, default True) – Show x/y ticks and labels

  • **kwargs – Parameters to be passed to self.plot() function.

Returns

Return type

str

svg_fast(height=250, aspectratio=1.4, interactive=False)[source]

Return HTML svg tag with bare-bones version of histogram (no ticks, labels).

Parameters
  • height (int, default 250) – Height of plot in pixels

  • aspectratio (float, default 1.4) – Aspect ratio of plot

  • interactive (bool, default False) – Whether to display bin contents on mouse hover.

Returns

Return type

str

transpose()[source]

Returns the transpose of the Hist2D

Returns

Return type

Hist2D

yahist.utils module

yahist.utils.binomial_obs_z(data, bkg, bkgerr)[source]

Calculate pull values according to https://root.cern.ch/doc/v606/NumberCountingUtils_8cxx_source.html#l00137 The scipy version is vectorized, so you can feed in arrays If gaussian_fallback return a simple gaussian pull when data count is 0, otherwise both ROOT and scipy will return inf/nan.

yahist.utils.clopper_pearson_error(passed, total, level=0.6827)[source]

matching TEfficiency::ClopperPearson(), >>> ROOT.TEfficiency.ClopperPearson(total, passed, level, is_upper)

yahist.utils.compute_darkness(r, g, b, a=1.0)[source]
yahist.utils.convert_dates(obj)[source]
yahist.utils.darken_color(color, amount=0.2)[source]
yahist.utils.draw_error_band(h, ax=None, **kwargs)[source]
yahist.utils.has_uniform_spacing(obj, epsilon=1e-06)[source]
yahist.utils.histogramdd_wrapper(a, bins, range_, weights, overflow, threads)[source]
yahist.utils.ignore_division_errors(f)[source]
yahist.utils.is_datelike(obj)[source]
yahist.utils.is_listlike(obj)[source]
yahist.utils.nan_to_num(f)[source]
yahist.utils.plot_stack(hists, **kwargs)[source]

Plots a list of Hist1D objects as a stack

Parameters
  • hists (list of Hist1D objects) –

  • kwargs (passed to Hist1D.plot()) –

yahist.utils.poisson_errors(obs, alpha=0.3173)[source]

Return poisson low and high values for a series of data observations

yahist.utils.register_with_dask(classes)[source]

Register classes with dask so that it can serialize the underlying numpy arrays a bit faster

yahist.utils.set_default_style()[source]

Module contents