4 Block Results
Glen Whitney edited this page 2024-09-12 23:22:44 +00:00

In Rust, a block that ends with a semicolon evaluates to (), and a block that doesn't end with a semicolon evaluates to the value of its last expression. In Husht, we want semicolons to be unnecessary, so how do we make this distinction? Here are some ideas.

Glen added a proposal at the bottom

Explicitly mark results

To indicate that a block evaluates to its last expression, you mark the last expression with a special token. This inverts the Rust convention, in the sense that you indicate a block result by the presence of a token rather than the absence of one.

Pros

  • Simple rule
  • Doesn't require type-checking in the transpiler
  • Tight correspondence with Rust syntax

Cons

  • Adds a little noise—but block results usually make up a small fraction of expressions, so noise should still decrease overall
  • Prevents some Rust code from being valid Husht

Here are some ideas for result markers.

Arrow away from result

"Ship out." For function blocks, can be seen as the input end of the return type arrow in the signature.

let q = x*x - y*y
if q >= 0
    <- x
else
    <- y

Arrow toward result

"Out of the pipe." For function blocks, can be seen as standing in for the return type in the signature.

let q = x*x - y*y
if q >= 0
    -> x 
else
    -> y

Dollar sign

"Cash out."

let q = x*x - y*y
if q >= 0
    $ x 
else
    $ y

Allow implicit coercion to Unit

There are few or no circumstances where Rust allows implicit coercion to Unit, but Husht could be more permissive about this—perhaps only in particular coercion sites. This might allow us to infer most of the time whether a block should be treated as though the last expression ends with a semicolon. However, it would probably also lead to ambiguity in weird but not totally unreasonable code like the following, where we could infer either i32 or Unit for the type of result.

fn say_hello
    println! "Hello"
    0

fn say_goodbye
    println! "Goodbye"
    1

fn say_something(arriving: bool)
    let result = if arriving
        say_hello()
    else
        say_goodbye()
    /* function body continues... */

Ideas for resolving the ambiguity:

  • Only do implicit coercion as a "last resort" (no idea how to formalize that)?
  • Storing or using the result of a block forces it to take the value of its final expressions?
  • Require explicit coercion in cases like this?

Pros

  • Might reduce verbosity in many different situations.

Cons

  • Requires type-checking in the transpiler

  • Messing with such a basic feature of Rust could have far-reaching effects with hard-to-foresee consequences

  • I'm worried that this could collapse lots of different type errors into errors involving (), and obscure many errors' locations. That can lead to frustrating, uninformative error messages, akin to "... ended by \end{document} in LaTeX. For example, the Rust code

    let x = if true {
        2.0
    } else {
        3
    };
    let x_sq = x*x;
    

    gives the error "if and else have incompatible types" error. Allowing implicit coercion to () might mean we instead get the error "cannot multiply () by ()" in the expression x*x, making it harder to figure out what and where the problem is.

Glen's proposal

Husht will generally be semicolon free. I feel like most blocks will actually be fine with returning their last expression's value. So could we start with using exactly Rust's convention? In other words, if there is no semicolon on a block's last expression, its value is returned. If there is a semicolon, then its value is dropped/suppressed (whichever it actually is) and the block returns Unit. Since we are not using semicolon for anything else in Husht (except possibly if you for some reason really want two statements on one line), it seems perfectly fine to use it for this. So I propose we start this way -- will clearly be super easy to implement -- and see if there end up being an annoying amount of semicolons that we need to have. My guess is that it will be a reasonably small handful, which will be fine. They do mean something, after all, so they are not WET (write everything twice) -- they carry actual independent information, in that block-final position.

If it turns out that a clear majority of block-final expressions are needing semicolons and we can't come up with a good, typing-free "80% rule" for when Husht should insert them, we could as you suggest flip the scrip and always insert them, adopting one of your proposed notations for explicitly suppressing them.

This proposal seems similar to what Julia does. A Julia function returns the value of its final expression by default. If you want it to return nothing, you have to explicitly end it with the expression nothing (or return nothing), which I've never done in my own code. (Maybe it's more routine in library code.) —Aaron

Sep 2 discussion takeaways

During the meeting, we weighed the merits of two approaches:

  • A block in Husht returns its last statement by default. Some token, like a final semicolon, can be used to suppress the return.
  • A block in Husht returns () by default. Adding some token to the final expression will cause it to be returned.

Both approaches sometimes require the last statement in a block to include an extra token. One advantage of returning () by default is that the extra token communicates the author's intent: "I want this value to be returned." When the last statement is returned by default, the extra token communicates something more contingent on implementation details and less related to the author's intent: "This final expression happens to return a value other than (), but I don't want that value to be returned."

Sep 12 tentative proposal

This is a mild change to Glen's first proposal: we use the Rust convention that unterminated statements propagate their value and semicolon coerces that to Unit, except for the last statement of a function that is either untyped or explicitly typed to return Unit (by name, not an alias, so we can look for it syntactically). These statements will automatically have the semicolon inserted, because there is no way we could need to use their values.

We will see how much "semicolon jockeying" we need to do and adjust the principle if need be, perhaps adding a "ship this expression out" symbol.