Thanks to visit codestin.com
Credit goes to github.com

Skip to content
kmillar edited this page Sep 15, 2014 · 1 revision

CXXR: Porting

This page gives guidance on porting R code and C packages to work with CXXR, and on some coding practices that may affect the portability of CXXR to different platforms.

R Code

In general, if R code behaves differently under CXXR and under the standard C-based R (in the release of CR on which this release of CXXR is based), then that is a bug in CXXR: please report it. However, differences in timing and in space consumption are to be expected. The following differences are intentional:

  • CXXR does not permit the S4 object flag to be unset (e.g. using asS4()) for an object which is of type S4 (S4SXP). Update: this restriction was applied in release 0.17-2.7.2, but is no longer applied as of release 0.18-2.8.1 because of an issue in the methods package; it may be reapplied at some time in the future.
  • CXXR cannot deserialize CR's 'pre-version-1' serialization format. Please advise if this causes you problems; otherwise fixing it is of very low priority.
  • Setting attributes on symbol objects (SYMSXP) is strongly discouraged, and may be forbidden in future. Attributes may not be set on string (CHARSXP) objects.
  • If an uncaught error occurs in evaluating an on.exit expression, the error is reported, but it does not result in an immediate return to top-level: the function to which the on.exit relates returns in the same way as if the on.exit expression had not resulted in an error. This behaviour is nonconformant to Sec. 8.3 of the R Language Definition, but fixing this is not seen as a priority. Let me know if this causes you problems (or if you actually prefer this behaviour!).
  • If x is a one-dimensional array with dimnames, then in CXXR evaluating x[] preserves the dimnames; in CR they are discarded.
  • The functionality of tracemem (and untracemem and retracemem) is currently unavailable in CXXR, even if CXXR is configured with --enable-memory-profiling. Let me know if this is a serious nuisance: otherwise fixing it is of low priority.

Also, there may be differences in the behaviour of R functions which probe into the implementation of the interpreter. In particular:

  • gc() now returns different quantities, in the form of a vector with three rows and two columns: see the revised help page. The reporting enabled by gcinfo(TRUE) in CR is not currently implemented in CXXR; in other words, the function gcinfo() is a no-op. Likewise gctorture() is a no-op in CXXR, whose aggressive approach to garbage collection will generally manifest memory-protection bugs without help!
  • Although mem.limits(nsize, vsize) retains the same interface, the interpretation of the quantities involved has changed somewhat; likewise the corresponding command-line options to R. See the revised Memory help page for details.
  • Memory profiling, as accessed via Rprofmem in package utils, has not (yet) been properly reengineered for CXXR, and the relevant code has not been tested. Do not rely on the CR behaviour to persist.
  • The function memory.profile() performs no useful function in CXXR: it simply returns a vector of zeroes.
  • In CXXR, the configuration option --with-valgrind-instrumentation is not used. If valgrind is to be used with the memcheck tool, it is recommended that MemoryBank.cpp be recompiled with the preprocessor variable NO_CELLPOOLS defined. Then CXXR will allocate all memory blocks directly via C++'s ::operator new (rather than using CXXR's internal memory pools implemented by class CellPool), and such memory blocks will therefore be monitored by memcheck.
  • The hash parameter to new.env() is ignored, and env.profile() always returns NULL.

Moreover, while CXXR is at an alpha development stage, internal logic errors (i.e. errors due to bugs in the interpreter rather than to bugs in the R code it is interpreting) will sometimes cause the interpreter itself to terminate, even in circumstances where CR would manage to recover to the top-level prompt. This is intentional in order to 'preserve the scene of crime' for debugging. Also, typing Control-C will currently cause the CXXR interpreter to terminate, rather than returning to the top-level prompt: if this is a nuisance, change the setting of R_SignalHandlers at around line 809 in main.cpp from 0 to 1, and rebuild.

C Code

Latent memory protection bugs

C and C++ code that appears to work under CR can often contain latent memory protection bugs that will only manifest themselves when a garbage collection occurs at a particular point in execution. Such bugs are very likely to become hard failures under CXXR: refer to this page for advice on diagnosing such bugs.

Mandatory changes for code using R.h or S.h:

  • C code that calls (directly or indirectly) certain CXXR internal functions that are now implemented in C++ should be compiled in such a way that C++ exceptions are propagated correctly. For example, using gcc this can be achieved by specifying the compiler flag -fexceptions. The most likely case where this will be necessary is in code that calls error() (aka Rf_error); without this change, the R interpreter may not return correctly to the top-level R prompt following an error. (It is intended to remove this requirement in a future release.)
  • In CR, for historical reasons, R_alloc() and kindred functions always return a memory block containing at least one more byte than the number requested. This cannot be relied upon in CXXR.

Mandatory changes for code using Rinternals.h:

The following changes are required in addition to those listed above for code using .h or S.h:

  • R_NilValue in CXXR is simply a macro expanding to NULL (which will typically be further macro-expanded to (void*)0 in C, or plain 0 or null_ptr in C++), rather than a pointer to a real object. To smooth this change, the following changes have been made to accessor functions:
    • CAR(), CDR(), TAG() and ATTRIB() each return a null pointer if passed a null pointer.
    • OBJECT() and IS_S4_OBJECT()each return FALSE if passed a null pointer.
    • LENGTH(), NAMED() and TRACE() each return 0 if passed a null pointer.
    • SET_NAMED() is a no-op if the first argument is a null pointer.

However, other accessor functions are likely to crash if invoked for R_NilValue: the calling code should introduce appropriate checks and workarounds.

In existing C/C++ code, it occasionally happens that a function is designed to return R_NilValue to signify that the result is an R NULL, and to return a null pointer to signify some other eventuality such as an error. Note that in CXXR these return values are indistinguishable, and such functions must be redesigned.

  • SEXPTYPE is now an enumeration, rather than being typedefed to unsigned int, but the numerical values of particular SEXPTYPEs are unchanged. (This change appears to have been under comtemplation within CR.) This may necessitate some explicit conversions or changes in the types of variables.
  • In CXXR, VECTOR_ELT() and SET_VECTOR_ELT() can only be applied to SEXPs of type VECSXP (implemented internally using class CXXR::ListVector). In particular, these functions cannot be applied to EXPRSXPs, for which the new functions XVECTOR_ELT() and SET_XVECTOR_ELT() should be used instead.
  • In CXXR, the tail (CDR) of any ConsCell object (i.e. LISTSXP, LANGSXP, DOTSXP or BCODESXP) must be of class PairList (i.e. LISTSXP specifically).
  • SET_TYPEOF()has been abolished.
  • allocSExp() can be used to create objects only of ConsCell types.
  • Functions SET_FORMALS() and SET_BODY() are no longer available: the formals and body of a closure must be set at the time the closure is created (e.g. using mkCLOSXP()).
  • Functions SET_PRENV() and SET_PRCODE() are no longer available; the environment and code of a Promise must be set at the time it is created (e.g. using mkPROMISE()). SET_PRVALUE() is still available (for the time being), and will automatically null the environment pointer if the value is set to anything other than R_UnboundValue.
  • CR makes no check that the argument of LENGTH() points to a vector object. CXXR does check this (unless UNCHECKED_SEXP_DOWNCAST is defined); however, a null pointer argument is acceptable, in which case (as noted above) LENGTH() simply returns 0.
  • In CXXR, SET_ATTRIB(x, v) does not simply plug its (list) argument v into the attribute field of x; instead it presents the elements of the list in sequence to RObject::setAttribute(), which verifies that class invariants are preserved. Consequently altering v after calling SET_ATTRIB() - as is currently done in model.c in CR - may well not have the desired effect, and is deprecated. (Although in CXXR - unlike CR - SET_ATTRIB() may allocate memory, it has been engineered so that this will never result in a mark-sweep garbage collection; this is to avoid breaking existing code.)
  • Attempting to apply SET_ATTRIB() to a cached CHARSXP raises an error: such objects should be regarded as immutable. Applying SET_ATTRIB() to a SYMSXP is strongly discouraged, and may raise an error in future.
  • SET_OBJECT() has been abolished. The m_has_class field of RObject is maintained automatically by the RObject class interface, according to whether or not a class attribute is set.
  • RDEBUG() and SET_RDEBUG() are applicable only to closures; use ENV_DEBUG() and SET_ENV_DEBUG() to query/control single-stepping within environments.
  • TRACE() and SET_TRACE() are applicable only to FunctionBase objects (i.e. CLOSXP, SPECIALSXP and BUILTINSXP) and will raise an error if used otherwise. However, as noted above TRACE() may also be applied to a null pointer, in which case it returns 0.
  • PRSEEN() and SET_PRSEEN() are no longer available. The relevant code should instead use the interface of class Promise directly.
  • ENVFLAGS(), HASHTAB(), SET_ENCLOS(), SET_ENVFLAGS(), SET_FRAME() and SET_HASHTAB() are no longer available. The relevant code should instead use the interface of class Environment directly.
  • FRAME() produces on the fly a representation of an Environment's frame as a PairList; it is no longer a simple accessor function. Consequently its return value will need specific protection from garbage collection: you cannot rely for this on the fact that the Environment itself is protected.
  • LEVELS() and SETLEVELS() should be used only during serialization and deserialization respectively. This reflects the fact that the 'general purpose' field of CR (field gp of sxpinfo_struct) has been replaced by various special-purpose fields, each placed as far down the RObject class hierarchy as is practical.
  • CXXR does not provide a tag field for objects of type S4Object (S4SXP). (According to the 'R Internals' document, CR apparently does, but this doesn't appear to be implemented consistently: for example - at least as of CR 2.7.2 - duplicate() doesn't duplicate the tag field.)
  • As noted above for R code, CXXR does not permit the S4 object flag to be unset (e.g. using RObject::setS4Object(false)) for an object of type S4Object (S4SXP). (This restriction is currently in abeyance.)
  • Function Rf_mkSYMSXP()is no longer available. The relevant code should use instead Symbol::obtain() to obtain a pointer to a symbol (and this enforces that requirement that there should be at most one standard symbol with a given name); Symbol::makeDDSymbol() is also available to create a dot-dot symbol.
  • BINDING_IS_LOCKED(), LOCK_BINDING(), UNLOCK_BINDING() and SET_ACTIVE_BINDING_BIT() are now applicable only to objects of type PairList (LISTSXP), and should be used only in connection with the serialization and deserialization of environments. Similarly, IS_ACTIVE_BINDING() is now only applicable to objects of a type derived from ConsCell. In particular, these functions are not applicable to symbols (SYMSXP). This reflects the fact that the base environment is now a regular environment, rather than being implemented via the contents of Symbol objects. For the same reason, SYMVALUE() and SET_SYMVALUE() now simply look up or set the value of a symbol in the base environment.
  • In CR it is an error to apply Rf_eval() to a CHARSXP; in CXXR it is not an error, and doing so returns a pointer to the CHARSXP (String) object itself, with its NAMED field increased if necessary to 2, i.e. this case is handled in the same way as REALSXP etc.
  • When CR calls a BUILTINSXP function, it coerces any tags in the argument list to be SYMSXP objects; CXXR does not. (But preferred practice in CXXR is for tags always to be Symbol (SYMSXP) objects anyway.)
  • R_RestartToken does not exist in CXXR.
  • The function Rf_countContexts() is not available in CXXR.
  • The function Rf_applyClosure() no longer exists; instead use the interface of class Closure directly.
  • The functions R_bcEncode() and R_bcDecode() no longer exist. (When 'threaded code' is in use, CXXR stores the threaded form of the bytecode inside a ByteCode object alongside the unthreaded form; this threaded form is created automatically by the class constructor, and is not visible outside the class.) Typically, in CXXR R_bcEncode(x) can be replaced simply by x, and similarly for R_bcDecode(x).

Suggested changes:

  • Redesignate your code as C++ :-) : this will enable more functions to be inlined, and open the way to future benefits. At the very least, remove features of your code that gratuitously prevent it being compiled by a C++ compiler: identifiers called class, this, new or private, for example (all of which occur in CR!). But remember to use extern "C" appropriately in your header files, to prevent C++ mangling of the names of functions intended to be visible to C code.
  • Rework code as necessary to prevent compiler warnings that a const qualifier is being discarded.
  • Use const qualifiers, especially to function arguments, wherever appropriate.
  • Avoid assuming that SYMSXP objects (i.e. symbols) are automatically protected against garbage collection (though for the present they are).

FORTRAN Code

Mandatory changes:

FORTRAN code that calls (directly or indirectly) certain CXXR internal functions that are now implemented in C++ should be compiled in such a way that C++ exceptions are propagated correctly. For example, using gfortran this can apparently be achieved by specifying the compiler flag -fexceptions (though it is not well documented). The most likely case where this will be necessary is in code that calls rexit(); without this change, the R interpreter may not return correctly to the top-level R prompt following an error. (It is intended to remove this requirement in a future release.)

C++ Code

It is a central intention of CXXR to make a wide range of functionality, over and above that offered by CR, available to C++ packages via the $(R_HOME)/include/CXXR API. However, package authors should be aware that this API is currently in a state of considerable flux from release to release. If you are exploiting this API please let me know, so that I can take your requirements on board, and forewarn you of upcoming changes.

Porting to Other Platforms

The following are areas where portability has been traded for efficiency, simplicity and/or clarity in new code generated for CXXR. There may be other unportabilities in code inherited from CR.

  • Using std::list, it may in places be assumed that if a node is spliced from one list to another, an iterator pointing to that node remains valid as an iterator (though it is now an iterator within the destination list rather than within the source list). This assumption is specifically contrary to ISO14882:1998 and ISO14882:2003, though this has been identified as a defect in the standard, rectified in the draft 'C++ 0x' standard.
  • In places the CXXR makefiles assume that the following features of GNU make are available:
    • Appending to make variables using +=.
    • Including other files into a makefile using the include directive; the included files are remade by make automatically if necessary.
    • Pattern rules (using % as a placeholder).

Copyright © Andrew Runnalls 2008-12