numpy · charris · Apr 30, 2017 · Apr 3, 2017 · Apr 28, 2017
diff --git a/doc/release/1.13.0-notes.rst b/doc/release/1.13.0-notes.rst
@@ -30,6 +30,8 @@ Deprecations
 Build System Changes
 ====================
 
+ * ``numpy.distutils`` now automatically determines C-file dependencies with
+   GCC compatible compilers.
 
 Future Changes
 ==============
@@ -135,7 +137,7 @@ for duplicate N-1-dimensional elements using ``numpy.unique``. The original
 behaviour is recovered if ``axis=None`` (default).
 
 ``np.gradient`` now supports unevenly spaced data
-------------------------------------------------
+-------------------------------------------------
 Users can now specify a not-constant spacing for data.
 In particular ``np.gradient`` can now take:
 
@@ -200,10 +202,20 @@ of arrays.
 
 It is similar to Matlab's square bracket notation for creating block matrices.
 
-Numpy may be built with relaxed stride checking debugging
+Support for tracemalloc in Python 3.6
+-------------------------------------
+NumPy now supports memory tracing with tracemalloc_ module of Python 3.6 or
+newer. Memory allocations from NumPy are placed into the domain defined by
+``numpy.lib.tracemalloc_domain``.
+Note that NumPy allocation will not show up in tracemalloc_ of earlier Python
+versions.
+
+.. _tracemalloc: https://docs.python.org/3/library/tracemalloc.html
+
+NumPy may be built with relaxed stride checking debugging
 ---------------------------------------------------------
 Setting NPY_RELAXED_STRIDES_DEBUG=1 in the enviroment when relaxed stride
-checking is enabled will cause numpy to be compiled with the affected strides
+checking is enabled will cause NumPy to be compiled with the affected strides
 set to the maximum value of npy_intp in order to help detect invalid usage of
 the strides in downstream projects. When enabled, invalid usage often results
 in an error being raised, but the exact type of error depends on the details of
@@ -241,9 +253,9 @@ been optimized to be a significantly faster for contiguous data.
 
 Fix for PPC long double floating point information
 --------------------------------------------------
-In previous versions of numpy, the ``finfo`` function returned invalid
+In previous versions of NumPy, the ``finfo`` function returned invalid
 information about the `double double`_ format of the ``longdouble`` float type
-on Power PC (PPC).  The invalid values resulted from the failure of the numpy
+on Power PC (PPC).  The invalid values resulted from the failure of the NumPy
 algorithm to deal with the variable number of digits in the significand
 that are a feature of `PPC long doubles`.  This release by-passes the failing
 algorithm by using heuristics to detect the presence of the PPC double double
@@ -252,8 +264,6 @@ function is faster than previous releases.
 
 .. _PPC long doubles: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.genprogc/128bit_long_double_floating-point_datatype.htm
 
-.. _issues: https://github.com/numpy/numpy/issues/2669
-
 .. _double double: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic
 
 Better default repr for ``ndarray`` subclasses
@@ -282,14 +292,6 @@ All of the following functions in ``np.linalg`` now work when given input
 arrays with a 0 in the last two dimensions: ``det``, ``slogdet``, ``pinv``,
 ``eigvals``, ``eigvalsh``, ``eig``, ``eigh``.
 
-``argsort`` on masked arrays takes the same default arguments as ``sort``
--------------------------------------------------------------------------
-By default, ``argsort`` now places the masked values at the end of the sorted
-array, in the same way that ``sort`` already did. Additionally, the
-``end_with`` argument is added to ``argsort``, for consistency with ``sort``.
-Note that this argument is not added at the end, so breaks any code that
-passed ``fill_value`` as a positional argument.
-
 Bundled version of LAPACK is now 3.2.2
 --------------------------------------
 NumPy comes bundled with a minimal implementation of lapack for systems without
@@ -314,8 +316,8 @@ Ufunc behavior for overlapping inputs
 -------------------------------------
 
 Operations where ufunc input and output operands have memory overlap
-produced undefined results in previous Numpy versions, due to data
-dependency issues. In Numpy 1.13.0, results from such operations are
+produced undefined results in previous NumPy versions, due to data
+dependency issues. In NumPy 1.13.0, results from such operations are
 now defined to be the same as for equivalent operations where there is
 no memory overlap.
 
@@ -333,7 +335,7 @@ To illustrate a previously undefined operation::
     >>> x = np.arange(16).astype(float)
     >>> np.add(x[1:], x[:-1], out=x[1:])
 
-In Numpy 1.13.0 the last line is guaranteed to be equivalent to::
+In NumPy 1.13.0 the last line is guaranteed to be equivalent to::
 
     >>> np.add(x[1:].copy(), x[:-1].copy(), out=x[1:])
 
@@ -342,7 +344,7 @@ A similar operation with simple non-problematic data dependence is::
     >>> x = np.arange(16).astype(float)
     >>> np.add(x[1:], x[:-1], out=x[:-1])
 
-It will continue to produce the same results as in previous Numpy
+It will continue to produce the same results as in previous NumPy
 versions, and will not involve unnecessary temporary copies.
 
 The change applies also to in-place binary operations, for example::
@@ -351,12 +353,20 @@ The change applies also to in-place binary operations, for example::
     >>> x += x.T
 
 This statement is now guaranteed to be equivalent to ``x[...] = x + x.T``,
-whereas in previous Numpy versions the results were undefined.
+whereas in previous NumPy versions the results were undefined.
+
+``argsort`` on masked arrays takes the same default arguments as ``sort``
+-------------------------------------------------------------------------
+By default, ``argsort`` now places the masked values at the end of the sorted
+array, in the same way that ``sort`` already did. Additionally, the
+``end_with`` argument is added to ``argsort``, for consistency with ``sort``.
+Note that this argument is not added at the end, so breaks any code that
+passed ``fill_value`` as a positional argument.
 
 ``average`` now preserves subclasses
 ------------------------------------
 For ndarray subclasses, ``numpy.average`` will now return an instance of the
-subclass, matching the behavior of most other numpy functions such as ``mean``.
+subclass, matching the behavior of most other NumPy functions such as ``mean``.
 As a consequence, also calls that returned a scalar may now return a subclass
 array scalar.
 

diff --git a/numpy/core/src/multiarray/alloc.c b/numpy/core/src/multiarray/alloc.c
@@ -2,12 +2,25 @@
 #include <Python.h>
 #include "structmember.h"
 
+#if PY_VERSION_HEX >= 0x03060000
+#include <pymem.h>
+/* public api in 3.7 */
+#if PY_VERSION_HEX < 0x03070000
+#define PyTraceMalloc_Track _PyTraceMalloc_Track
+#define PyTraceMalloc_Untrack _PyTraceMalloc_Untrack
+#endif
+#else
+#define PyTraceMalloc_Track(...)
+#define PyTraceMalloc_Untrack(...)
+#endif
+
 #define NPY_NO_DEPRECATED_API NPY_API_VERSION
 #define _MULTIARRAYMODULE
 #include <numpy/ndarraytypes.h>
 #include "numpy/arrayobject.h"
 #include <numpy/npy_common.h>
 #include "npy_config.h"
+#include "alloc.h"
 
 #include <assert.h>
 
@@ -192,6 +205,7 @@ PyDataMem_NEW(size_t size)
         }
         NPY_DISABLE_C_API
     }
+    PyTraceMalloc_Track(NPY_TRACE_DOMAIN, (npy_uintp)result, size);
     return result;
 }
 
@@ -213,6 +227,7 @@ PyDataMem_NEW_ZEROED(size_t size, size_t elsize)
         }
         NPY_DISABLE_C_API
     }
+    PyTraceMalloc_Track(NPY_TRACE_DOMAIN, (npy_uintp)result, size);
     return result;
 }
 
@@ -222,6 +237,7 @@ PyDataMem_NEW_ZEROED(size_t size, size_t elsize)
 NPY_NO_EXPORT void
 PyDataMem_FREE(void *ptr)
 {
+    PyTraceMalloc_Untrack(NPY_TRACE_DOMAIN, (npy_uintp)ptr);
     free(ptr);
     if (_PyDataMem_eventhook != NULL) {
         NPY_ALLOW_C_API_DEF
@@ -243,6 +259,10 @@ PyDataMem_RENEW(void *ptr, size_t size)
     void *result;
 
     result = realloc(ptr, size);
+    if (result != ptr) {
+	PyTraceMalloc_Untrack(NPY_TRACE_DOMAIN, (npy_uintp)ptr);
+    }
+    PyTraceMalloc_Track(NPY_TRACE_DOMAIN, (npy_uintp)result, size);
     if (_PyDataMem_eventhook != NULL) {
         NPY_ALLOW_C_API_DEF
         NPY_ALLOW_C_API

diff --git a/numpy/core/src/multiarray/alloc.h b/numpy/core/src/multiarray/alloc.h
@@ -4,6 +4,8 @@
 #define _MULTIARRAYMODULE
 #include <numpy/ndarraytypes.h>
 
+#define NPY_TRACE_DOMAIN 389047
+
 NPY_NO_EXPORT void *
 npy_alloc_cache(npy_uintp sz);
 

diff --git a/numpy/core/src/multiarray/multiarraymodule.c b/numpy/core/src/multiarray/multiarraymodule.c
@@ -60,6 +60,7 @@ NPY_NO_EXPORT int NPY_NUMUSERTYPES = 0;
 #include "templ_common.h" /* for npy_mul_with_overflow_intp */
 #include "compiled_base.h"
 #include "mem_overlap.h"
+#include "alloc.h"
 
 #include "get_attr_string.h"
 
@@ -4624,6 +4625,10 @@ PyMODINIT_FUNC initmultiarray(void) {
      */
     PyDict_SetItemString (d, "error", PyExc_Exception);
 
+    s = PyInt_FromLong(NPY_TRACE_DOMAIN);
+    PyDict_SetItemString(d, "tracemalloc_domain", s);
+    Py_DECREF(s);
+
     s = PyUString_FromString("3.1");
     PyDict_SetItemString(d, "__version__", s);
     Py_DECREF(s);

diff --git a/numpy/lib/__init__.py b/numpy/lib/__init__.py
@@ -25,8 +25,9 @@
 from .arrayterator import Arrayterator
 from .arraypad import *
 from ._version import *
+from numpy.core.multiarray import tracemalloc_domain
 
-__all__ = ['emath', 'math']
+__all__ = ['emath', 'math', 'tracemalloc_domain']
 __all__ += type_check.__all__
 __all__ += index_tricks.__all__
 __all__ += function_base.__all__

diff --git a/tools/allocation_tracking/README.md b/tools/allocation_tracking/README.md
@@ -0,0 +1,11 @@
+Example for using the `PyDataMem_SetEventHook` to track allocations inside numpy.
+
+`alloc_hook.pyx` implements a hook in Cython that calls back into a python
+function. `track_allocations.py` uses it for a simple listing of allocations.
+It can be built with the `setup.py` file in this folder.
+
+Note that since Python 3.6 the builtin tracemalloc module can be used to
+track allocations inside numpy.
+Numpy places its CPU memory allocations into the `np.lib.tracemalloc_domain`
+domain.
+See https://docs.python.org/3/library/tracemalloc.html.