Thanks to visit codestin.com
Credit goes to chandlerc.blog

Carbon’s Successor Strategy

From C++ interop to memory safety

Chandler Carruth
@chandlerc1024
chandlerc@{google,gmail}.com

CppNow 2023

Carbon Language

An experimental successor to C++

Carbon goals as a successor language

Starts with our goals for C++ in https://wg21.link/p2137r0:

  • Performance-critical software
  • Software and language evolution
  • Code that is easy to read, understand, and write
  • Practical safety and testing mechanisms
  • Fast and scalable development
  • Modern OS platforms, hardware architectures, and environments

Carbon goals as a successor language

  • Performance-critical software
  • Software and language evolution
  • Code that is easy to read, understand, and write
  • Practical safety and testing mechanisms
  • Fast and scalable development
  • Modern OS platforms, hardware architectures, and environments
  • Interoperability with and migration from existing C++ code

Background & overview in the CppNorth talk

Carbon open source project

https://github.com/carbon-language/carbon-lang

Carbon open source project

  • Removing friction wherever we can for folks to join & contribute
    • GitHub PR (Pull Request) focused workflow, with detailed docs
    • Over 751 PRs merged 12 months, 231 from outside the initial team
    • Active Discord server, both real-time and async discussion
  • Hosting Summer of Code students this summer
  • Dedicated list of good-first-issues for new contributors

Carbon design, evolution, & governance

  • Building a comprehensive, living design document
    • 46 files and over 21k lines of markdown
  • Evolved through 33 GitHub PR proposals in the past 12 months
  • GitHub issue process to discuss and make decisions (59)

Community metrics

  • 36 open & minuted weekly meetings
    • Updates on decisions made and newly requested
    • Updates on proposal RFCs and approvals
    • Summaries of discussions, and other activities
  • Published 3 quarterly transparency reports

Carbon implementation

  • Explorer provides high-level “abstract machine” implementation
  • Toolchain is the expected user-facing implementation
    • Compiling, linking, and other tooling as needed
    • Built on LLVM and all of its technology: Clang, LLD, etc.
    • Many things parse, and early work on semantics & lowering

Carbon’s milestones

  • 0.1: the MVP (Minimum Viable Product) to start evaluating Carbon
    • Focused on complete, functioning C++ interop

Carbon’s milestones

  • 0.1: the MVP (Minimum Viable Product) to start evaluating Carbon
    • Focused on complete, functioning C++ interop
  • 0.2: feature complete to enable both finishing evaluations & concluding experiment
    • Notable feature: memory safety

Carbon’s milestones

  • 0.1: the MVP (Minimum Viable Product) to start evaluating Carbon
    • Focused on complete, functioning C++ interop
  • 0.2: feature complete to enable both finishing evaluations & concluding experiment
    • Notable feature: memory safety
  • 1.0: if the experiment is successful, our production-ready milestone

Carbon’s milestones are over one year in scope

Building for sustainability and the long term

Carbon’s roadmap for 2023

  • Finish the design for 0.1’s features
  • Explorer implements the risky parts of this design
  • Toolchain can build a minimal program mixing C++ and Carbon
  • Share ideas & progress with the C++ community (Hi!)

Carbon’s successor strategy has some important question marks:

  • What do we mean by successor? Why not superset?
  • How do we make this work? C++ interop requires superpowers…
  • How does this get us to memory safety?

Successor language vs. superset language

Or: what kind of successor language is Carbon?

Two main approaches to a successor languages:

  1. Connect through intersection

  2. Connect through interoperation

Intersection approach has different forms:

Subset:

Codestin Search App Original Successor

Example: “Modern” C++

Superset:

Codestin Search App Successor Original

Example: Circle

Overlap:

Codestin Search App Original Successor

Example: Cpp2?

Intersection can be implemented as a C++ frontend: cpp-front and Cpp2

#include <iostream>
#include <string>

name: () -> std::string = {
  s: std::string = "world";
  decorate(s);
  return s;
}

decorate: (inout s: std::string) = {
  s = "[" + s + "]";
}

auto main() -> int {
  // name();
  std::cout << "Hello " << name() << "\n";
}
//=== Cpp2 type declarations
#include "cpp2util.h"

//=== Cpp2 type definitions
//=== and function declarations
#include <iostream>
#include <string>

[[nodiscard]] auto name() -> std::string;

auto decorate(std::string& s) -> void;

auto main() -> int {
  // name();
  std::cout << "Hello " << name() << "\n";
}

//=== Cpp2 function definitions
[[nodiscard]] auto name() -> std::string{
  std::string s {"world"};
  decorate(s);
  return std::move(s);
}

auto decorate(std::string& s) -> void{
  s = "[" + s + "]";
}

Intersection as extensions to a C++ compiler

Good:

  • Sound semantic model
  • More flexibility
  • Avoids two-step compilation
    • Good error messages
    • Faster compile times

Limitations:

  • Complexity of both C++ and new language
  • Hard to separate concerns / design

Aside: fine-grained selection of extensions in C++ successor

Good: allows rapid and incremental experimentation

Problems:

  • Creates serious problems of fragmentation
  • When designing an API, callee and caller need to agree
    • Makes mutually incompatible features really hard
    • Limiting to composable features works but is… limiting
    • Especially if all of existing C++ is included
    • Already a problem causing pain in C++: exceptions, RTTI, LP64 vs. LLP64
  • Applications and libraries will need a cohesive & coherent feature set

Interop successor approaches connect

One connection interop diagram

Interop successor approaches connect

One connection interop diagram

Interop successor approaches connect

One connection interop diagram

Interop successor approaches connect

One connection interop diagram

Intersection constrains and limits the design & implementation

Interoperation is expensive and difficult to build

Intersection constrains and limits
both the design & implementation

  • Intersecting features must cover all interactions between original and new
    • The successor inherits at least the technical debt inherent in the overlap
    • Can’t make design improvements to those features
    • Can’t shape non-overlapping designs in a way that conflicts with overlap
  • Implementation has to support the full union of features
    • No separation of concerns or strong abstraction for new designs
    • Hard to ever fully realize benefits even where no original code

Intersection constrains and limits
both the design & implementation

Codestin Search App Original Successor

Interoperation is expensive and difficult to build

  • Total cost is higher – requires building & maintaining the interop layer
  • Can require greater complexity in the new language
  • The starting cost and difficulty are especially impacted
    • Have to have a viable whole new language
    • And have to build near complete interop layer
    • No easy incremental paths

Carbon’s strategy is to interoperate with C++

  • More expensive to build
  • But we expect good return on that investment:
    • More flexible language design
    • Strong separation from legacy & tech debt
    • Best possible experience within Carbon code

Seamless C++ interop needs superpowers…

What all do we need for C++ interop?

  • Calling functions (calling conventions, etc.)

What all do we need for C++ interop?

  • Calling functions (calling conventions, etc.)
  • Passing types with data, including memory layout & model

What all do we need for C++ interop?

  • Calling functions (calling conventions, etc.)
  • Passing types with data, including memory layout & model
  • Accessing members, both fields and methods (name lookup)

What all do we need for C++ interop?

  • Calling functions (calling conventions, etc.)
  • Passing types with data, including memory layout & model
  • Accessing members, both fields and methods (name lookup)
  • Operators (overloads, ADL) 😨

What all do we need for C++ interop?

  • Calling functions (calling conventions, etc.)
  • Passing types with data, including memory layout & model
  • Accessing members, both fields and methods (name lookup)
  • Operators (overloads, ADL) 😨
  • Inheritance, including virtual dispatch

What all do we need for C++ interop?

  • Calling functions (calling conventions, etc.)
  • Passing types with data, including memory layout & model
  • Accessing members, both fields and methods (name lookup)
  • Operators (overloads, ADL) 😨
  • Inheritance, including virtual dispatch
  • Templates… 😱

Clang to the rescue!

  • Can use a full blown C++ compiler to help build each part
  • Embedded into the Carbon toolchain to connect to each Carbon feature
  • Even has a bunch of extensions and extra features that help!

Basics need to directly map between Carbon and C++, roughly:

C++ type Carbon type
boolean bool bool
bytes unsigned char, std::byte byte
ints std::intN_t iN (i8, …, i64)
unsigned ints std::uintN_t uN (u8, …, u64)
floats std::floatN_t fN (f16, f32, f64)

Also need to map parameters, but Carbon parameters are different from C++

First, we need to understand Carbon’s expression categories

Carbon’s expression categorization:

  • Value expressions: abstract, read-only values; immutable, no address

Carbon’s expression categorization:

  • Value expressions: abstract, read-only values; immutable, no address
  • Reference expressions: mutable objects with storage and an address

Carbon’s expression categorization:

  • Value expressions: abstract, read-only values; immutable, no address
  • Reference expressions: mutable objects with storage and an address
    • Durable reference expressions: non-temporary storage, outlive the full expression

Carbon’s expression categorization:

  • Value expressions: abstract, read-only values; immutable, no address
  • Reference expressions: mutable objects with storage and an address
    • Durable reference expressions: non-temporary storage, outlive the full expression
    • Ephemeral reference expressions: can refer to temporary storage

Carbon’s expression categorization:

  • Value expressions: abstract, read-only values; immutable, no address
  • Reference expressions: mutable objects with storage and an address
    • Durable reference expressions: non-temporary storage, outlive the full expression
    • Ephemeral reference expressions: can refer to temporary storage
  • Initializing expressions: initialize an object within implicitly provided storage
    • Used to model function call expressions (where the function returns)
    • Return directly initializes an object in provided storage

Carbon parameters and expression categories

  • Parameters are modeled with pattern matching in Carbon
  • By default, parameter patterns match value expressions
    • Bind a name to a read-only, abstract value
  • Parameter variable patterns (marked with var) create local storage
    • These patterns match initializing expressions for their storage

Extend our mappings on function boundaries:

C++ parameter type Carbon parameter pattern
C++ const & const T& T (value)
Unmodified by-value T or const T T
C++ mutated by-value T var T (variable)
C++ references T& T* (non-null pointer)
C++ pointers T* T*? (optional pointer)

Ready to call C++ from Carbon
Let’s break it down:

  1. Import from C++ with Clang
  2. Map it into a Carbon construct
  3. Use it from Carbon
  4. Synthesize and compile a C++ use with Clang

Import from C++ with Clang

// cat_main.carbon
import Cpp library "cat.h"

fn Run() {
  // Normal Carbon call:
  Cpp.Meow();
}
// cat.h
void Meow();

Map it into a Carbon construct

// cat_main.carbon
import Cpp library "cat.h"

fn Run() {
  // Normal Carbon call:
  Cpp.Meow();
}
// cat.h                        
void Meow();
// Synthesized C++
export module carbon_cat_main;

// Use modules tech to import a header, 
// and make it available to Carbon.
export import "cat.h"

// Also make a hook available to Carbon 
export extern "CarbonMagic"
void Call_Meow() {
  // Synthesize the C++ use here,
  // where it can be compiled as C++:
  Meow();
}
// Synthesized Carbon
package Cpp api

fn Meow() {
  // Call the synthesized low-level hook:
  Call_Meow();
}

Use it in Carbon

// cat_main.carbon
import Cpp library "cat.h"

fn Run() {
  // Normal Carbon call:
  Cpp.Meow();
}
// cat.h                        
void Meow();
// Synthesized C++
export module carbon_cat_main;

// Use modules tech to import a header,
// and make it available to Carbon.
export import "cat.h"

// Also make a hook available to Carbon
export extern "CarbonMagic"
void Call_Meow() {
  // Synthesize the C++ use here,
  // where it can be compiled as C++:
  Meow();
}
// Synthesized Carbon
package Cpp api

fn Meow() {
  // Call the synthesized low-level hook:
  Call_Meow();
}

Synthesize and compile a C++ use with Clang

// cat_main.carbon
import Cpp library "cat.h"

fn Run() {
  // Normal Carbon call:
  Cpp.Meow();
}
// cat.h                        
void Meow();
// Synthesized Carbon
package Cpp api

fn Meow() {
  // Call the synthesized low-level hook:
  Call_Meow();
}
// Synthesized C++
export module carbon_cat_main;

// Use modules tech to import a header,
// and make it available to Carbon.
export import "cat.h"

// Also make a hook available to Carbon
export extern "CarbonMagic"
void Call_Meow() {
  // Synthesize the C++ use here,
  // where it can be compiled as C++:
  Meow();
}

Synthesize and compile a C++ use with Clang

// cat_main.carbon
import Cpp library "cat.h"

fn Run() {
  // Normal Carbon call:
  Cpp.Meow();
}
// cat.h                        
void Meow();
// Synthesized C++
export module carbon_cat_main;

// Use modules tech to import a header,
// and make it available to Carbon.
export import "cat.h"

// Also make a hook available to Carbon
export extern "CarbonMagic"
void Call_Meow() {
  // Synthesize the C++ use here,
  // where it can be compiled as C++:
  Meow();
}
// Synthesized Carbon
package Cpp api

fn Meow() {
  // Call the synthesized low-level hook:
  Call_Meow();
}

See, it’s easy! No problem! 🤡

More interesting: methods and fields!

Import from C++ with Clang

// cat_meow.carbon
import Cpp library "cat.h"

fn MeowAndGetLives(c: Cat) -> i32 {
  c.Meow(4.2);
  return c.lives;
}
// cat.h
struct Cat {
  void Meow(const float vol) const;

  std::int32_t lives;
};

Map it into a Carbon construct

// cat_meow.carbon
import Cpp library "cat.h"

fn MeowAndGetLives(c: Cat) -> i32 {
  c.Meow(4.2);
  return c.lives;
}
// cat.h
struct Cat {
  void Meow(const float vol) const;

  std::int32_t lives;
};
// Synthesized C++
export module carbon_cat_meow;
export import "cat.h"

export extern "CarbonMagic"
void Call_Cat_Meow(const Cat &c, float vol) { 
  // Method call handled here:
  c.Meow(volume);
}

export extern "CarbonMagic"
void Read_Cat_lives(const Cat &c) {
  // Layout and offset here:
  return c.lives;
}
// Synthesized Carbon
package Cpp api

class Cat {
  fn Meow[self: Self](vol: f32) {
    Call_Cat_Meow(self, vol);
  }

  // Eventually, a property:
  // ``=> Read_Cat_Lives(c);``
  var lives: i32;
}

Use it in Carbon

// cat_meow.carbon
import Cpp library "cat.h"

fn MeowAndGetLives(c: Cat) -> i32 {
  c.Meow(4.2);
  return c.lives;
}
// cat.h
struct Cat {
  void Meow(const float vol) const;

  std::int32_t lives;
};
// Synthesized C++
export module carbon_cat_meow;
export import "cat.h"

export extern "CarbonMagic"
void Call_Cat_Meow(const Cat &c, float vol) { 
  // Method call handled here:
  c.Meow(volume);
}

export extern "CarbonMagic"
std::int32_t Read_Cat_Lives(const Cat &c) {
  // Layout and offset here:
  return c.lives;
}
// Synthesized Carbon
package Cpp api

class Cat {
  fn Meow[self: Self](vol: f32) {
    Call_Cat_Meow(self, vol);
  }

  // Eventually, a property:
  // ``=> Read_Cat_Lives(c);``
  var lives: i32;
}

Synthesize and compile a C++ use with Clang

// cat_meow.carbon
import Cpp library "cat.h"

fn MeowAndGetLives(c: Cat) -> i32 {
  c.Meow(4.2);
  return c.lives;
}
// cat.h
struct Cat {
  void Meow(const float vol) const;

  std::int32_t lives;
};
// Synthesized Carbon
package Cpp api

class Cat {
  fn Meow[self: Self](vol: f32) {
    Call_Cat_Meow(self, vol);
  }

  // Eventually, a property:
  // ``=> Read_Cat_Lives(c);``
  var lives: i32;
}
// Synthesized C++
export module carbon_cat_meow;
export import "cat.h"

export extern "CarbonMagic"
void Call_Cat_Meow(const Cat &c, float vol) { 
  // Method call handled here:
  c.Meow(volume);
}

export extern "CarbonMagic"
void Read_Cat_lives(const Cat &c) {
  // Layout and offset here:
  return c.lives;
}

Synthesize and compile a C++ use with Clang

// cat_meow.carbon
import Cpp library "cat.h"

fn MeowAndGetLives(c: Cat) -> i32 {
  c.Meow(4.2);
  return c.lives;
}
// cat.h
struct Cat {
  void Meow(const float vol) const;

  std::int32_t lives;
};
// Synthesized C++
export module carbon_cat_meow;
export import "cat.h"

export extern "CarbonMagic"
void Call_Cat_Meow(const Cat &c, float vol) { 
  // Method call handled here:
  c.Meow(volume);
}

export extern "CarbonMagic"
void Read_Cat_lives(const Cat &c) {
  // Layout and offset here:
  return c.lives;
}
// Synthesized Carbon
package Cpp api

class Cat {
  fn Meow[self: Self](vol: f32) {
    Call_Cat_Meow(self, vol);
  }

  // Eventually, a property:
  // ``=> Read_Cat_Lives(c);``
  var lives: i32;
}

ADL and operator overloading! 😨

Import from C++ with Clang

// cat.h
struct Cat { ... };

Cat operator+(const Cat& lhs,
              const Cat& rhs) {
  Cat result;
  result.lives =
      lhs.lives + rhs.lives;
  return result;
}
// cat_sum.carbon
import Cpp library "cat.h"

fn SumCatsSomehow(c1: Cat,
                  c2: Cat) -> Cat {
  // No idea why we're adding cats...
  return c1 + c2;

}

Map it into a Carbon construct

// cat_sum.carbon
import Cpp library "cat.h"

fn SumCatsSomehow(c1: Cat,
                  c2: Cat) -> Cat {
  // No idea why we're adding cats...
  return c1 + c2;

}
// cat.h
struct Cat { ... };

Cat operator+(const Cat& lhs,
              const Cat& rhs) {
  Cat result;
  result.lives =
      lhs.lives + rhs.lives;
  return result;
}
// Synthesized C++
export module carbon_cat_sum;
export import "cat.h"

export extern "CarbonMagic"
Cat Call_Cat_Op_Plus(const Cat &lhs,
                     const Cat &rhs) {
  // We compile the operator here, so we
  // get whatever C++ ADL would find.
  return lhs + rhs;
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

// We can find ``operator+`` in C++,
// so we synthesize a Carbon operator.
impl Cat as Core.AddWith(Cat) {
  fn Op[self: Self](rhs: Cat) -> Cat {
    return Call_Cat_Op_Plus(self, rhs);
  }
}

Use it in Carbon

// cat_sum.carbon
import Cpp library "cat.h"

fn SumCatsSomehow(c1: Cat,
                  c2: Cat) -> Cat {
  // No idea why we're adding cats...
  return c1 + c2;
  // In Carbon, this calls ``Op`` below.
}
// cat.h
struct Cat { ... };

Cat operator+(const Cat& lhs,
              const Cat& rhs) {
  Cat result;
  result.lives =
      lhs.lives + rhs.lives;
  return result;
}
// Synthesized C++
export module carbon_cat_sum;
export import "cat.h"

export extern "CarbonMagic"
Cat Call_Cat_Op_Plus(const Cat &lhs,
                     const Cat &rhs) {
  // We compile the operator here, so we
  // get whatever C++ ADL would find.
  return lhs + rhs;
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

// We can find ``operator+`` in C++,
// so we synthesize a Carbon operator.
impl Cat as Core.AddWith(Cat) {
  fn Op[self: Self](rhs: Cat) -> Cat {
    return Call_Cat_Op_Plus(self, rhs);
  }
}

Synthesize and compile a C++ use with Clang

// cat_sum.carbon
import Cpp library "cat.h"

fn SumCatsSomehow(c1: Cat,
                  c2: Cat) -> Cat {
  // No idea why we're adding cats...
  return c1 + c2;
  // In Carbon, this calls ``Op`` below.
}
// cat.h
struct Cat { ... };

Cat operator+(const Cat& lhs,
              const Cat& rhs) {
  Cat result;
  result.lives =
      lhs.lives + rhs.lives;
  return result;
}
// Synthesized C++
export module carbon_cat_sum;
export import "cat.h"

export extern "CarbonMagic"
Cat Call_Cat_Op_Plus(const Cat &lhs,
                     const Cat &rhs) {
  // We compile the operator here, so we
  // get whatever C++ ADL would find.
  return lhs + rhs;
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

// We can find ``operator+`` in C++,
// so we synthesize a Carbon operator.
impl Cat as Core.AddWith(Cat) {
  fn Op[self: Self](rhs: Cat) -> Cat {
    return Call_Cat_Op_Plus(self, rhs);
  }
}

Synthesize and compile a C++ use with Clang

// cat_sum.carbon
import Cpp library "cat.h"

fn SumCatsSomehow(c1: Cat,
                  c2: Cat) -> Cat {
  // No idea why we're adding cats...
  return c1 + c2;
  // In Carbon, this calls ``Op`` below.
}
// cat.h
struct Cat { ... };

Cat operator+(const Cat& lhs,
              const Cat& rhs) {
  Cat result;
  result.lives =
      lhs.lives + rhs.lives;
  return result;
}
// Synthesized C++
export module carbon_cat_sum;
export import "cat.h"

export extern "CarbonMagic"
Cat Call_Cat_Op_Plus(const Cat &lhs,
                     const Cat &rhs) {
  // We compile the operator here, so we
  // get whatever C++ ADL would find.
  return lhs + rhs;
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

// We can find ``operator+`` in C++,
// so we synthesize a Carbon operator.
impl Cat as Core.AddWith(Cat) {
  fn Op[self: Self](rhs: Cat) -> Cat {
    return Call_Cat_Op_Plus(self, rhs);
  }
}

TEMPLATES! LET’S GOOOOO!!! 😱😱😱

Import from C++ with Clang

// cat.h
struct Cat { ... };

template <typename T> struct Vector {
  // ...
  template <typename U> void Push(U x) { ... } 
};

Vector<Cat> global_cats;
// global_cats.carbon
import Cpp library "cat.h"

fn AddGlobalCat(c: Cpp.Cat) {
  Cpp.global_cats.Push(c);
}

Map it into a Carbon construct

// global_cats.carbon
import Cpp library "cat.h"

fn AddGlobalCat(c: Cpp.Cat) {
  Cpp.global_cats.Push(c);
}
// cat.h
struct Cat { ... };

template <typename T> struct Vector {
  // ...
  template <typename U> void Push(U x) { ... } 
};

Vector<Cat> global_cats;
// Synthesized C++
export module carbon_global_cats;
export import "cat.h"

// Generated for each instantiation
// of ``T`` and ``U``, here both are ``Cat``.
export extern "CarbonMagic"
void Call_Vector_Cat_Push_Cat(
    Vector<Cat> *self,
    Cat *x) {
  // Provide C++ R-value-ref move:
  self->Push(std::move(*x));
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

class Vector(template T:! type) {
  // ...
  fn Push[addr self: Self*,
          template U:! type](var x: U) { 
    Call_Vector_T_Push_U(self, &x);
  }
}

var global_cats: Vector(Cat);

Use it in Carbon

// global_cats.carbon
import Cpp library "cat.h"

fn AddGlobalCat(c: Cpp.Cat) {
  Cpp.global_cats.Push(c);
}
// cat.h
struct Cat { ... };

template <typename T> struct Vector {
  // ...
  template <typename U> void Push(U x) { ... } 
};

Vector<Cat> global_cats;
// Synthesized C++
export module carbon_global_cats;
export import "cat.h"

// Generated for each instantiation
// of ``T`` and ``U``, here both are ``Cat``.
export extern "CarbonMagic"
void Call_Vector_Cat_Push_Cat(
    Vector<Cat> *self,
    Cat *x) {
  // Provide C++ R-value-ref move:
  self->Push(std::move(*x));
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

class Vector(template T:! type) {
  // ...
  fn Push[addr self: Self*,
          template U:! type](var x: U) { 
    Call_Vector_T_Push_U(self, &x);
  }
}

var global_cats: Vector(Cat);

Synthesize and compile a C++ use with Clang

// global_cats.carbon
import Cpp library "cat.h"

fn AddGlobalCat(c: Cpp.Cat) {
  Cpp.global_cats.Push(c);
}
// cat.h
struct Cat { ... };

template <typename T> struct Vector {
  // ...
  template <typename U> void Push(U x) { ... } 
};

Vector<Cat> global_cats;
// Synthesized C++
export module carbon_global_cats;
export import "cat.h"

// Generated for each instantiation
// of ``T`` and ``U``, here both are ``Cat``.
export extern "CarbonMagic"
void Call_Vector_Cat_Push_Cat(
    Vector<Cat> *self,
    Cat *x) {
  // Provide C++ R-value-ref move:
  self->Push(std::move(*x));
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

class Vector(template T:! type) {
  // ...
  fn Push[addr self: Self*,
          template U:! type](var x: U) { 
    Call_Vector_T_Push_U(self, &x);
  }
}

var global_cats: Vector(Cat);

Synthesize and compile a C++ use with Clang

// global_cats.carbon
import Cpp library "cat.h"

fn AddGlobalCat(c: Cpp.Cat) {
  Cpp.global_cats.Push(c);
}
// cat.h
struct Cat { ... };

template <typename T> struct Vector {
  // ...
  template <typename U> void Push(U x) { ... } 
};

Vector<Cat> global_cats;
// Synthesized C++
export module carbon_global_cats;
export import "cat.h"

// Generated for each instantiation
// of ``T`` and ``U``, here both are ``Cat``.
export extern "CarbonMagic"
void Call_Vector_Cat_Push_Cat(
    Vector<Cat> *self,
    Cat *x) {
  // Provide C++ R-value-ref move:
  self->Push(std::move(*x));
}
// Synthesized Carbon
package Cpp api

class Cat { ... }

class Vector(template T:! type) {
  // ...
  fn Push[addr self: Self*,
          template U:! type](var x: U) { 
    Call_Vector_T_Push_U(self, &x);
  }
}

var global_cats: Vector(Cat);

There is a pattern to this approach:

  • Carbon constructs provide a Carbon API for C++ imports
  • C++ constructs implement the C++ behavior of that API
  • Carbon’s compiler synthesizes a low-level, simplified connection layer
    • Because the connection is never user-visible, it can cheat a lot
    • Example: generate manually during instantiation

Calling Carbon from C++? Same idea:

  • Carbon will build a C++ module or header to expose Carbon to C++
  • Synthesizing C++ constructs to model the C++ API for a Carbon import
  • Map through low-level connection layer to fully Carbon behavior

What about that low-level connection layer?

We already have it: LLVM!

LLVM is the glue that holds C++ interop together

  • Already know we can lower both Carbon and C++ into LLVM
    • Guaranteed to be able to represent everything
  • Unconstrained by source, can select optimal representation
  • LLVM’s optimizer can inline and optimize away overhead

Also provide a fallback of C++ source generation

  • Limited / partial coverage, and more overhead
  • Useful when bridging to other toolchains or new platforms
  • Want to enable shipping a binary Carbon library with a C++ header

This pattern enables so much more:

  • Bundling a C++ toolchain to build the C++ code
    • Allows a custom STL ABI to transparently map more types
  • Transparent mapping of views and non-owning wrappers on API boundaries
  • Ranges and iteration mapping
  • Inheritance, virtual dispatch, v-tables
  • Translating error handling both to & from exceptions

Memory safety

What do we mean by memory safety?

Bugs, safety, and safety bugs

  • Bugs: program behavior contrary to the author’s intent
    • Software, in practice, always has bugs – we must plan for them

Bugs, safety, and safety bugs

  • Bugs: program behavior contrary to the author’s intent
    • Software, in practice, always has bugs – we must plan for them
  • Safety: invariants or limits on program behavior in the face of bugs

Bugs, safety, and safety bugs

  • Bugs: program behavior contrary to the author’s intent
    • Software, in practice, always has bugs – we must plan for them
  • Safety: invariants or limits on program behavior in the face of bugs
  • Safety bugs: bugs where some aspect of program behavior has no invariants or limits
    • Checking for an unexpected value and calling abort(): detects a bug, but is safe
    • Calling std::unreachable() is also a bug, but unsafe and a safety bug

Bugs, safety, and safety bugs

  • Bugs: program behavior contrary to the author’s intent
    • Software, in practice, always has bugs – we must plan for them
  • Safety: invariants or limits on program behavior in the face of bugs
  • Safety bugs: bugs where some aspect of program behavior has no invariants or limits
    • Checking for an unexpected value and calling abort(): detects a bug, but is safe
    • Calling std::unreachable() is also a bug, but unsafe and a safety bug
  • Initial bug: the first deviation of program behavior
    • Buggy behavior often causes more buggy behavior – all are bugs
    • Our focus is on fixing the initial bug

Safety, bugs, and security vulnerabilities

  • Security vulnerabilities: ability of a malicious user to subvert a program’s behavior, typically through exploiting bugs

Safety, bugs, and security vulnerabilities

  • Security vulnerabilities: ability of a malicious user to subvert a program’s behavior, typically through exploiting bugs
    • Detecting: while still vulnerable, exploits of a bug can be detected or tracked

Safety, bugs, and security vulnerabilities

  • Security vulnerabilities: ability of a malicious user to subvert a program’s behavior, typically through exploiting bugs
    • Detecting: while still vulnerable, exploits of a bug can be detected or tracked
    • Mitigating: making a vulnerability significantly more expensive, difficult, or improbable to be exploited

Safety, bugs, and security vulnerabilities

  • Security vulnerabilities: ability of a malicious user to subvert a program’s behavior, typically through exploiting bugs
    • Detecting: while still vulnerable, exploits of a bug can be detected or tracked
    • Mitigating: making a vulnerability significantly more expensive, difficult, or improbable to be exploited
    • Preventing: while still a bug, making it impossible to be a vulnerability

Safety, bugs, and security vulnerabilities

  • Security vulnerabilities: ability of a malicious user to subvert a program’s behavior, typically through exploiting bugs
    • Detecting: while still vulnerable, exploits of a bug can be detected or tracked
    • Mitigating: making a vulnerability significantly more expensive, difficult, or improbable to be exploited
    • Preventing: while still a bug, making it impossible to be a vulnerability
    • Fixing: no longer a bug, much less a vulnerability

Safety, bugs, and security vulnerabilities

  • Security vulnerabilities: ability of a malicious user to subvert a program’s behavior, typically through exploiting bugs
    • Detecting: while still vulnerable, exploits of a bug can be detected or tracked
    • Mitigating: making a vulnerability significantly more expensive, difficult, or improbable to be exploited
    • Preventing: while still a bug, making it impossible to be a vulnerability
    • Fixing: no longer a bug, much less a vulnerability
  • Safety doesn’t require fixing bugs, but it can prevent or mitigate vulnerabilities
    • Constructively-correct or proofs are a subset of safety techniques,
      essentially limiting even forming a program in the face of bugs

Memory safety bugs and security

  • Memory safety bugs: Safety bugs that additionally read or write memory
  • A focus because they are the dominant cause of security vulnerabilities
    • Over 65% of high / critical vulnerabilities (sources 1,2,3,4,5,6)
  • Memory safety: limits program behavior to only read or write intended memory, even in the face of bugs
    • Sufficient to mitigate and prevent these classes of vulnerabilities in practice

Classes of memory safety bugs

  • Spatial: memory access outside of an allocated region

Classes of memory safety bugs

  • Spatial: memory access outside of an allocated region
  • Temporal: access after the lifetime of the object in memory

Classes of memory safety bugs

  • Spatial: memory access outside of an allocated region
  • Temporal: access after the lifetime of the object in memory
  • Type: accessing memory which isn’t a valid representation for a type

Classes of memory safety bugs

  • Spatial: memory access outside of an allocated region
  • Temporal: access after the lifetime of the object in memory
  • Type: accessing memory which isn’t a valid representation for a type
  • Initialization: reading memory before it is initialized

Classes of memory safety bugs

  • Spatial: memory access outside of an allocated region
  • Temporal: access after the lifetime of the object in memory
  • Type: accessing memory which isn’t a valid representation for a type
  • Initialization: reading memory before it is initialized
  • Data-Race: unsynchronized reads & writes by different threads

Suggested programming language approach to memory safety

A language is rigorously memory-safe if it:

  • has a well-delineated safe subset, and
  • provides spatial, temporal, type, and initialization safety in its safe subset.

This should be the required minimum for programming languages going forward.

Details of rigorous memory safety

  • Safe subset must be a viable default, with unsafe being exceptional
  • Delineated unsafe constructs must be visible and auditable
  • Safety can be through any combination of compile-time and runtime protections
    • However, must prevent vulnerabilities, not just mitigate them

Details of rigorous memory safety

  • Data-race safety remains highly desirable but not a strict requirement:
    • It would increase the constraints on the available solutions
    • No evidence (yet) of comparable security risks when other safety is achieved

How can Carbon get us there starting from C++?

First, we need to introduce a safe subset

Best candidate for C++ is likely similar to Rust’s borrow checker

  • High performance: ensures safety at compile-time with the type system
  • Explored in the context of C++’s type system, w/ many barriers
    • Non-destructive moves constantly leaves “dangling” references
    • Inconvenient to track in/out “borrows” w/o language support
  • Fundamentally requires a significant increase in type system complexity
    • More parameterized types in C++ means more templates
    • C++ doesn’t have the tools used by Rust (and others): checked generics

Beyond language changes, a borrow checker needs different APIs and idioms

Example differences in a borrow-checked language

void swap_span(std::span<int> a,
               std::span<int> b) {
  for (size_t i = 0;
       i < a.size();
       i += 1) {
    std::swap(a[i], b[i]);
  }
}

int main() {
   std::vector<int> v = {1, 2, 3, 4, 5, 6};
   swap_span(
       std::span(v).subspan(0, 3),
       std::span(v).subspan(3, 3)
   );
}
fn swap_span(a: &mut [i32],
             b: &mut [i32]) {


  for i in 0..a.len() {
    std::mem::swap(&mut a[i], &mut b[i])
  }
}

pub fn main() {
  let mut v = vec![1, 2, 3, 4, 5, 6, 7];
  swap_span(
    &mut v[0..3],
    &mut v[3..6],
  )
}

Equivalent code doesn’t work:

Rust error message

https://godbolt.org/z/MG1vE9Yxq

So how would we fix this?

Different code can “work”:

fn swap_span(a: &mut [i32], b: &mut [i32]) {
  for i in 0..a.len() {
    std::mem::swap(&mut a[i], &mut b[i])
  }
}

pub fn main() {
  let mut v = vec![1, 2, 3, 4, 5, 6];

  // Need to make a pointer, without borrowing ``v`` mutably.
  let ptr: *mut i32 = v.as_mut_ptr();

  // Bypassing the borrow checker so that we can make
  // two independent borrowing references from it.
  let first = unsafe { std::slice::from_raw_parts_mut(ptr, 3) };
  let second = unsafe { std::slice::from_raw_parts_mut(ptr.add(3), 3) };

  swap_span(first, second);
}

https://godbolt.org/z/Ehn5se9oP

Even more different code works better:

fn swap_span(a: &mut [i32], b: &mut [i32]) {
  for i in 0..a.len() {
    std::mem::swap(&mut a[i], &mut b[i])
  }
}

pub fn main() {
  let mut v = vec![1, 2, 3, 4, 5, 6];
  let (first, second) = (|v: &mut Vec<i32>| {
    // Need to make a pointer, without borrowing ``v`` mutably (again).
    let ptr: *mut i32 = v.as_mut_ptr();

    // Bypassing the borrow checker so that we can make
    // two independent borrowing references from it.
    let first = unsafe { std::slice::from_raw_parts_mut(ptr, 3) };
    let second = unsafe { std::slice::from_raw_parts_mut(ptr.add(3), 3) };

    (first, second)
  })(&mut v); // Takes a mutable borrow on ``v`` here.

  swap_span(first, second);
}

https://godbolt.org/z/5ov97MYEa

But this no longer resembles C++

void swap_span(std::span<int> a,
               std::span<int> b) {
  for (size_t i = 0;
       i < a.size();
       i += 1) {
    std::swap(a[i], b[i]);
  }
}

int main() {
   std::vector<int> v = {1, 2, 3,
                         4, 5, 6};
   swap_span(
       std::span(v).subspan(0, 3),
       std::span(v).subspan(3, 3)
   );
}
fn swap_span(a: &mut [i32],
             b: &mut [i32]) {
  for i in 0..a.len() {
    std::mem::swap(&mut a[i], &mut b[i])
  }
}

pub fn main() {
  let mut v = vec![1, 2, 3, 4, 5, 6];
  let (first, second) = (|v: &mut Vec<i32>| {
    let ptr: *mut i32 = v.as_mut_ptr();
    let first = unsafe {
      std::slice::from_raw_parts_mut(ptr, 3)
    };
    let second = unsafe {
      std::slice::from_raw_parts_mut(ptr.add(3),
                                     3)
    };

    (first, second)
  })(&mut v);

  swap_span(first, second);
}

It needs different APIs to work well:

void swap_span(std::span<int> a,
               std::span<int> b) {
  for (size_t i = 0;
       i < a.size();
       i += 1) {
    std::swap(a[i], b[i]);
  }
}

int main() {
   std::vector<int> v = {1, 2, 3, 4, 5, 6};

   swap_span(
       std::span(v).subspan(0, 3),
       std::span(v).subspan(3, 3)
   );
}
fn swap_span(a: &mut [i32],
             b: &mut [i32]) {


  for i in 0..a.len() {
    std::mem::swap(&mut a[i], &mut b[i])
  }
}

pub fn main() {
  let mut v = vec![1, 2, 3, 4, 5, 6];

  // Mutable borrows ``v`` once, but produces
  // two independent mutable spans.
  let (first, second) = v.split_at_mut(3);

  swap_span(first, second);
}

Making a safe subset a reasonable default requires breaking changes

  • Current defaults in C++ are too unsafe to be realistically delineated
  • This means a large amount of breaking change
    • Need to move existing unsafe code towards separable constructs
    • Clear space for safe defaults throughout the language
    • Especially disruptive to pointers, references, and the STL

WG21 makes this essentially impossible. 😞

Carbon gives us a viable strategy:

  • Make unsafe Carbon a migration target from C++ w/ great interop
  • Evolve and extend Carbon to have a viable safe subset
  • Migrate unsafe C++ to unsafe Carbon at scale
  • Incrementally rewrite unsafe Carbon to safe Carbon

Need to separate the two migrations

  • Making C++ → Carbon also require unsafe → safe magnifies the costs
    • Especially if not all code or users need to move to safety
    • At that point, should probably just target Rust
  • Interesting space, and focus of experiment, is a two-phase approach
  • Chance to drop the initial cost and scale up overall migration

What will this look like for memory safety? 🤷

Let’s look at a simpler example: null-safety

Null-safety: type-system enforced null pointers

  • Null pointers are tracked in the type system explicitly
  • Code must explicitly check for null before dereferencing
  • Result: no more null pointer bugs
    • Still bugs, and still null pointers!
    • Remaining bugs are incorrectly checking or handling null
    • These are localized, don’t cross APIs, are amenable to static analysis, etc.

C++ references are partially null safe,
but not enough

class Employer;

class Employee {
public:
  // Problem: can call with a temporary!
  Employee(const Employer& employer) : employer_(employer) {}

private:
  // Problem: can't copy-assign or move-assign even when desired!
  const Employer& employer_;
};

Can extend C++ to add an annotation

// Use a Clang extension to provide nullability.
template <typename T> using NonNull = T _Nonnull;

class Employer;

class Employee {
public:
  Employee(const NonNull<Employer*> employer) : employer_(employer) {}

private:
  const NonNull<Employer*> employer_;
};

Can extend C++ to add an annotation

// Use a Clang extension to provide nullability.
template <typename T> using NonNull = T _Nonnull;
template <typename T> using Nullable = T _Nullable;

class Employer;

class Employee {
public:
  Employee(const NonNull<Employer*> employer) : employer_(employer) {}

  void ChangeEmployer(const NonNull<Employer*> new_employer) {
    previous_employer_ = employer_;
    employer_ = new_employer;
  }

private:
  const NonNull<Employer*> employer_;

  const Nullable<Employer*> previous_employer_ = nullptr;
};

Can even establish a way to shift the default!

// Use a Clang extension to provide nullability.
#pragma clang assume_pointers(Nonnull)
template <typename T> using Nullable = T _Nullable;

class Employer;

class Employee {
public:
  Employee(const Employer* employer) : employer_(employer) {}

  void ChangeEmployer(const Employer* new_employer) {
    previous_employer_ = employer_;
    employer_ = new_employer;
  }

private:
  const Employer* employer_;

  const Nullable<Employer*> previous_employer_ = nullptr;
};

Limits of doing this in C++:

  • A lot of effort and distraction due to wrong defaults
  • Superficial simplicity, but deep complexity
    • Smart pointers: unique_ptr, shared_ptr, …
    • Conversions: const, derived-to-base, …
  • Expressive limits: can’t overload

And this is a best-case-scenario:
Well factored, fairly simple code

Carbon models nullable pointers as optional pointers

  • Cannot dereference a nullable pointer: it’s not a pointer!
  • Trivial to make unwrapping syntax make the potential for null obvious
  • As a full type, fully supported in the type system (overloads, etc)
  • Can build up smart pointers to consistently incorporate this model

Even in simple cases, we get nicer syntax:

class Employer;

class Employee {
  Make(employee: const Employer*) -> Employee {
    return {.employee = employee};
  }

  void ChangeEmployer[addr self: Self*](new_employee: const Employer*) {
    // If we allow direct assignment to an optional like C++ does:
    self->previous_employer = self->employer;

    self->employer = new_employer;
  }

  private var employer: const Employer*;

  // Makes an optional pointer with ``T*?``, defaults to null.
  private var previous_employer: const Employer*?;
};

Benefits of the model compound with advanced language features

  • Pattern matching can be designed for testing & unwrapping
  • Can layer control flow constructs, as in Rust, that further improve
    • if let to test and unwrap
    • let else to test and unwrap with early exit

Migration strategy for null-safety:

  1. Clean up C++ to be close to desired model
    • May use extensions or annotation systems
    • May not get full fidelity, coverage, or benefits

Migration strategy for null-safety:

  1. Clean up C++ to be close to desired model
    • May use extensions or annotation systems
    • May not get full fidelity, coverage, or benefits
  2. Migrate at-scale from C++ to Carbon
    • Specifically with any remaining null-unsafety

Migration strategy for null-safety:

  1. Clean up C++ to be close to desired model
    • May use extensions or annotation systems
    • May not get full fidelity, coverage, or benefits
  2. Migrate at-scale from C++ to Carbon
    • Specifically with any remaining null-unsafety
  3. Incrementally refactor towards null-safety
    • Redesign APIs as needed, leveraging language facilities

This is a pattern that we want to repeat

  • Reduce the gap and improve migration using C++ annotations & extensions
  • Large scale migration of code as-is from C++ to Carbon
  • Incremental and focused improvements with new features

Null-unsafe C++

-unsafe Carbon

-safe Carbon

Null-unsafe C++

-unsafe Carbon

-safe Carbon

Memory-unsafe C++

-unsafe Carbon

-safe Carbon

Null-unsafe C++

-unsafe Carbon

-safe Carbon

Memory-unsafe C++

-unsafe Carbon

-safe Carbon

C++ templates

Carbon templates

Checked generics
(see tomorrow’s talk!)

What is Carbon’s successor strategy?

Start with:

C++

C++

C++ interop & migration:

   ↓

Carbon

C++

C++ interop & migration:

   ↓

Carbon

Incremental refactoring:

   ↓

Better Carbon

Memory safety, generics, etc.

Resources: