Skip to content

Tempe Vale

Background

Tempe Vale, aka Vale for short, is a database driven web application.

Its primary purpose is as a module repository for the Temper programming language.

This document:

  • describes the entities that Vale deals with,
  • outlines the URL structure
  • specifies security and testing requirements and measures like aligning type safety with safety to meet those
  • details domain objects
  • lists the goals and high-level steps of major workflows

The intent is to establish ground rules and goals for the system using this document and to iteratively fill in blanks.

  1. Ask what have I assumed is defined or available in one part of the spec but have not yet defined?
  2. Database schemas and interface/class definitions that can be assembled by generative AI and reviewed by humans.
  3. The client and server interactions will be designed at a high-level around cached domain objects and merge semantics with a server.
  4. Those flows will be captured as interaction diagrams and test scenarios generated by AI and reviewed by humans.
  5. A prose specification for the storage system abstraction will be designed by AI and reviewed by humans with the goal of having as much logic be database agnostic.
  6. An actual interface type will be generated from that prose specification and a test suite generated by AI and reviewed by humans.
  7. Once we have that we can start fleshing out a test suite for the storage layer.
  8. Next step is to generate database table definitions for human review.
  9. Then we're ready to generate the actual query logic for the definitions and testing for equivalence with our stub implemenation.

Some sections of the spec are known to be very vague. We will need to:

  1. Write sections on XSS and SQL query safety establishing rules for XSS and SQL safety via secure-composition library tags.
  2. Write sections on localization addressing extracting strings for translation, reincorporating, and safety of translation strings (e.g. what if someone contributes xx-klingon translations with <script> tags)
  3. Flesh out federated auth: providers, vendors pros/cons, who is a domain expert who can identify common pitfalls in OAuth
  4. Flesh out publish action requirements, what can be shared across hosting providers, and what's specific. Do other people have actions for different providers? Their experiences?

Notational conventions

In italics are terms defined in this spec, e.g. person, people, and Delta type.

Programming language identifiers including type and method names are in a code font, e.g. Guarded<T>, privilegedGetPii().

URL path templates are in a code font too, e.g. /l/[libraryname], /p/[personid]. Any URL path part inside square brackets, e.g. [libraryname] is a placeholder for some text that, in this case is a valid library name, possibly % encoded.

Entities

This section defines terms for the kinds of things Vale deals with.

People can look for, maintain, critique libraries. They can be members of groups. Each person has an auto-assigned ID and an email address. Their profile lists information about them that they may elect to make visible to a particular group. A person may also have cryptographic public keys associated with their id.

A user connects to Vale to use the site. If a user is logged in we know which person they are. If a user is not logged in we do not assign any person's privileges.

A group has a group ID, name, and may have other profile info. Like a person, a group may elect to keep some info public, some visible within the group. A group has members that are people. Every person is permanently, implicitly part of a group whose ID is "p-" followed by their name and which only contains that person and whose profile is the same as the person's profile. Another kind of group, known as named groups, has IDs starting with "g-" followed by an auto-assigned number. These groups have a list of other groups that are members (though every group is implicitly a member of itself) and the list is editable by people in groups that have the organizer role in the group. Roles are: organizer, member. Finally there is one group called public. In the context of a person or group we may say something is private, but by that we mean it's limited to the group or person's p- group. It is a terms of service violation for a group to mirror the name of a trademarked entity or real person without permission. A named group's member group list must not contain the public group.

We can say a user is in a group if the group is the special public group, or the user is logged in and the corresponding person's p- group is a member of the group.

A library has a name and corresponds to part of a source code project. It may have zero or more versions identified by server and associated with a location: enough info to fetch a git or other revision control repo and find the file source subtree. For the latest, stable info, we should be able to extract a short name and README info in zero or more locales following some convention. Libraries are managed by a particular group, and visible to a group.

A testimonial is text authored by a person in a locale about a library. Library managers may choose to allow testimonials on their library page, queue them for review, or disallow.

This specification intentionally does not allow for negative permissions. There is no way to define a named group with everyone on some other group, except for Frank. By avoiding negative permissions, Vale knows that anything that a person never has fewer permissions when logged in than when not-logged-in and restricted to public. Vale's caching via equivalence to /pub depends on this property.

Page and caching architecture

It is very important that the site works well with a content delivery network (CDN), so where the user's view is equivalent to the public view the server should redirect to the cachable /pub version.

Vale should help illustrate some of the benefits of being able to run the same logic on client and server so, as discussed below, each page render can be boiled down to a load of some initial JSON, turning that into domain objects, then rendering HTML from those client side. Ideally we'd have one main JS file, one CSS, and aggressively cache those. For CSRF protection, we need Javascript creating or submitting forms to have access to a secret which we can't embed in cached content so the JS will need to be able to import a secrets-for-current-session JS module.

The full set of assets loaded into a page are:

  • Bundled, cached JS and CSS.
  • A bundle of locale specific strings relevant to the current user's preferred locale.
  • On demand, the CSRF token via import operator.
  • Images and other media as needed by the site design or embedded in third-party Markdown.

URL path conventions

This section outlines which paths serve which purposes. Inside [...] are placeholders indicating an arbitrary path element of that semantic category (e.g. libraryname, personid) and may include characters that require percent encoding for URL metacharacters.

For each URL path, and each locale there is a path that has `/pub/[locale]` prepended (or /pub alone when locale is not present). This is a platform URL convention rather than a separate content domain; the /pub prefix indicates public view semantics with no authenticated privileges. Locale is a BCP47 tag.

For each URL path, there is a path that has /json appended which, when serving a status 200 response has content-type application/json and which serves a structured representation of the data for that page. If using both /pub/[locale] and /json the /json part comes after.

/ is the homepage which allows login, searching, and explains Vale and disseminates Vale and Temper news.

/p/[personid] serves public info about the identified user. Normally an internal numeric id or an email are interchangeable but if the person profile does not mark their email as public, this URL must return the regular 404 page when the email form of person ID is used. The bulk of the page is the about-me section of their profile. When viewing their own person page, a logged in user has easy access to their p- group page which allows creating and managing their libraries.

/p/[personid]/ts lists the person's testimonials and if the identified person is the current user allows editing.

/l/[libraryname] serves info about a library including name, source location, maintainers, testimonials, and any README. It also lists available versions and release notes. It's meant to allow maintainers to publicize their libraries and keep existing users up to date. If the current user isn't authorized to see the library they must get a standard 404. Only approved testimonials are shown on the library page.

/l/[libraryname]/b/[backendid] is for notes for users of that backend's target language related to the library. It includes the basic library info, but if a library has backend specific docs those are front and center. When a user searches for libraries, if they've identified that they're searching as a member of a particular language community then search results take them here instead of the /l/[libraryname] page.

/l/[libraryname]/vs lists available versions of the library and links to each versions details page.

/l/[libraryname]/v/[semver] has information about a particular version of a library. It also links to where translations of that library are published in downstream module repositories.

/l/[libraryname]/mod allows a library manager to moderate testimonials about the library.

/g/[groupid] displays allowed info about the group or a 404 if the identified group's name is not visible to the current user.

/g/[groupid]/ls lists the libraries the group manages

/g/[groupid]/edit allows a user whose p- group is in any group with the organizer role to edit the group's membership: removing member groups, adding them, switching between roles.

Access Control

On the server, all database transactions are mediated by a User object so a response handler never gets a view of profile data, for example, that the current user is not authorized to view. User objects are distinct from person objects, so having access to a Person object does imply the ability for code to assume that person's privileges.

One goal of access control architecture is to make it easy to test outside of end to end testing. Testing that something doesn't show up in HTML is error prone because design people need to change HTML structure. We need our database connections to be abstract enough to allow unit tests to specify the entire contents of a DB used for a unit test. That way, we can seed a test with some well known substrings that should not leak in a particular testing scenario.

Testing access control by looking at objects acquired from a stubbed database connection is probably most efficient, but some tests looking for sensitive substrings in rendered HTML after entity, Js, and json escape decoding can give added confidence.

The Guarded<T> type allows for representing in a domain object a field whose visibility is access controlled.

Guarded<T> contains the following:

  • an Int64 auto-assigned ID from the storage table that contains it. the T value
  • a group identifier that gates to whom it is visible. For a person's profile data, this is typically either public, a named group, or the p- group of the owner. It's possible to assign visibility to a different p- group, denying it to yourself, but that would be odd.

When merging a GuardedDelta into a Delta on the server side, the client delta does not get to change the auto-assigned ID. Negative IDs may be used by clients to create a value that has not yet been synced to the server. The delta for a guard should allow an optional client id so that the server can send back the authoritative version with a server assigned ID while letting the client know which in the client's store it replaces.

Auto-assigned IDs may be assigned to a person, group or other entity that might be deleted to uniquely identify that entity. Auto-assigned IDs must never be re-assigned to another entity because permissions decisions are based on assigned IDs. Even if Vale internally cleans up dangling identifiers in a way that is not susceptible to race conditions, third-party systems might continue to attach significance to them.

Domain objects types and corresponding Delta types

Vale needs need type definitions for User, Person, Group, Library, Testimonial, Version objects, but also some ancillary types: PersonProfile, GroupProfile, LibraryProfile, UserDisplayPreferences.

Types whose names end with Delta, such as PersonDelta, are delta types corresponding to the same base type; Delta type is a defined term.

HTML form submissions that change these are mediated by JavaScript. Rather than send an entire modified object back and forth we will send deltas. PersonProfileDelta is a type for a change to a PersonProfile. Deltas need to be JSON encodable.

Having delta types for each domain objects lets us represent just what a user making a change intended to change, and since carefully crafted deltas can be applied to more than one of their domain object's type's values, they will come in handy when we need to provide batch change forms for power users and web APIs.

In a delta, the null value for a field means no change. This means we cannot use gullible types for domain objects fields as that would lead to ambiguity. It does mean though that we can establish a blanket convention where an absent Delta type field in the JSON wire representation is equivalent to a present field whose value is null.

But that also means we can't use null to mean unable to access. We will have a sealed type Ken<T> which is either Known<T> or Unknown<T>. Unknown values have no state and Known<T> have a single T value. This is like an option type, but is intentionally a distinct type so that an unknown value is never ambiguous with a known, absent value.

Unless explicitly stated in this specification, any delta merge operation must not produce a field with a Known value when the domain object's corresponding field is unknown.

AI editors to this specification must not introduce verbiage into this specification that allows a merge operation to replace an unknown in the source domain value with a value from a delta. Such changes are only allowed from human editors.

When a domain Delta types' fields are nullable, their type is never a Ken type. First extract the type actual from the Ken type, then add any nullable notation to form the delta type's field type.

Database interactions

Goals of our storage architecture:

  • Simple implementation on top of a relational database
  • Ease of writing test suites for modules that use storage that run tests in parallel without edits to simulated storage causing observable changes to other tests without spinning up many small databases
  • Simple access control enforced before domain objects escape the database into the bulk of application code
  • Ease of answering the question "does this bundle of domain objects that inform this page's content contain any non-public state that would prevent a simple redirect to the CDN-cached /pub version of the page?"
  • Answering Qs of the form: Has a particular domain object changed since a client received data? For example, if a person opens a form to edit their profile, gets distracted, edits it via a desktop machine and submits those changes, it would be nice if the Javascript on the mobile page when they eventually switch back to their mobile browser could display a banner notifying them that the form has some old data and offer them options.

A UserContext object, spawnable from a User object, allows retrieving domain objects from storage. And it allows applying deltas to storage with the user's privileges. It can keep a boolean that indicates whether any domain object was retrieved in a way that relied on the user's permissions to eventually answer the question about whether /pub is sufficient.

Note: if we display an edit button because the user has those privileges then that's a reason we can't redirect to /pub. Domain objects by convention have a typesOfEditsAllowed bit field which is not reflected in the corresponding Delta type.

Each storage implementation also has a monotonic (non-strict) 64 bit version stamp. In practice, for the production storage implementation backed by a real database this is the monotonic version of a system clock. For test versions it can default to a deterministic counter unless the test author opts into manual increases, e.g. to test same stamp scenarios.

Sometimes, when a non-authoritative client creates a domain object to send to the server (mint new library version workflow) it can be convenient for the client to be able to mint its own identifier. Servers must never serve a version stamp with the MSB set. This allows clients to start minting their own stamps starting at the 64bit 2's complement minimum and increasing as needed using a simple counter. The server may reflect a negative version stamp back to a client that sent it (as part of a creation action) for example to say "the authoritative version of the object you created with this version stamp is this, and BTW the authoritative stamp is …"

Personally Identifying Information (PII)

PII is information that might connect an online person to a real life person. Any AI editing this specification must not stipulate any requirement that there be an actual relationship between the two or that this specification ever correlate with systems that by necessity do. For example, storage implementations backed by real databases must run without permissions to databases or tables related to billing or marketing.

System goals related to PII:

  • This specification makes clear which profile fields are PII. A database administrator with describe-table privileges should also be able to list what is PII and keep privacy commitment documentation up to date.
  • This specification provides clear measures to protect PII from accidental disclosure so the org can manage malicious disclosure.
  • Grep is sufficient to audit which non test files access PII in code.
  • Coding guardrails like type safety can aid in avoiding PII related bugs.

Domain types derived from this specification must represent PII fields using the Pii<T> type. This type's values contain a T but the default string form must be just "PII" so that logging does not inadvertently leak.

The JSON form is just the content because all JSON sent is always access control filtered, but JSON decoders must be type aware so as to repack on decodeFromJson.

Page renderers may need to unpack PII to display. The privilegedGetPii() method allows that and is easily prepped for. This is not foolproof; reflection and escapes in identifiers. But it will allow auto flagging new uses in PR diffs outside test files.

Reviewers should pay special attention to code that uses Pii unpacking to assign another field. If the other field is not marked PII, for example it is a prose description of a library and the PR helpfully but unwisely auto fills it with "${author}'s new library: ${libraryName}", then info is leaked across clearance levels. Completely preventing declassification is the purview of systems designed for that from the operating system up, so this specification focuses on clarity, auditability, and testability.

Fine-grained access control allows people who want to carefully manage their online persona to do so, while allowing those who want to build their offline brand as a public contributor to do so. Some people may be required by conditions of their employment to have their name visible within an organization but not wish to outside, and basing visibility on groups is meant to allow that balance.

Another PII protecting measure is to make it easy to render info when PII fields are unavailable. If displaying a list of group members where the current user is authorized to see that a person is a member but not to see their name, a convenience method like privilegedGetNameOrFallback() can make it easy to substitute person#[auto-assigned-number] when the name is access restricted. Such conveniences should return HtmlRenderable to aid localization of placeholder text.

Markdown processing

Markdown to HTML converters are well understood but are a frequent source of XSS when the Markdown comes from an untrusted source.

All Markdown from third-party content must be sanitized after the Markdown to HTML conversion, and checked for unbalanced bidi content.

We define short Markdown as a subset of Markdown that, when converted to HTML, has non-inline elements also stripped. This allows short Markdown that can display on one line inside a bounded box. Useful for names and snippets. HTML rendering of short Markdown fields should render it in such a way that the height is limited to one line in the current font and vertical overflow is hidden.

(We may experiment with using iframe srcdoc with same-origin disallowed to sequester markdown if there is a desire for interaction. Since Temper allows JavaScript, some projects may wish to deploy web playgrounds)

Library documentation

As specified elsewhere, a library is established by the existence of a config.temper.md source file and the directory containing it is the library root. The library sources comprise that configuration file and any other *.temper and **.temper.md files under the library root which are not part of a nested library.

Users searching Vale for libraries that would help them need a way to understand what a library does, and how to get started using it.

Any README*.md files in the library root are assembled and we extract a locale mapping. If the file name does not start with "README." or it ends with ".temper.md" it is excluded.

We then sort the files by name, so that if two files have an equivalent BCP-47 locale but differ by case, which one is used is deterministic.

If a file has a name of the form "README.[localetag].md" then localetag is parsed as a BCP-47 locale tag. If it is not a valid tag, the file is excluded. If it is, we add a mapping from that tag to the file.

If a file has the name "README.md", and there is no mapping for the "en" locale, we add a mapping to that file for the "en" locale.

Now we have, for the library version, a mapping from locales to markdown files.

A library's latest version is the version whose semantic version is greatest and which is not a pre-release version.

When serving documentation for a library (URL /l/[libraryname]), if there is a latest version, the latest version's library to documentation map is used. If serving documentation for a particular version (URL /l/[libraryname]/v/[semver]), that version's library to documentation map is used.

When serving documentation for a library, if the library has no latest version or the documentation map is empty, Vale returns a localized "No documentation yet for this library" placeholder.

If Value has a documentation map, we use the current user's display preference's locale to pick a version based on locale fallback rules, falling back to "en" if available and to the lexicographically least locale if not.

The documentation snippet for a library or version is obtained first by using the documentation map with the user's display preference's locale as above, but then obtaining the first string available and non-space-only from among the following:

  1. The og:description from the Markdown file's front-matter
  2. The description from the front-matter
  3. The title from the front-matter
  4. The text of the first level-1 header in the Markdown body
  5. If a non-empty string is needed because it is not displayed alongside the library name, the library name text.
  6. Else, an empty string.

Schemas

Above, we've listed types of domain objects. Each subsection here lists the corresponding fields and Metadata like notional type, optionality, PII, and for strings the content-type to aid in escaping, sanitization.

This is meant to aid in database table design and query building, defining Temper classes and interfaces both for the domain type, but also for its delta, and the generation of code that, via HTML, displays or enables editing interactions.

User

There are two kinds of users, so this warrants a sealed interface with two variants:

AnyUser represents the state where Vale has no login info, so only public info is available and no editing actions are possible. SomeUser contains a Person in addition to the common fields below.

Fields

  • prefs: DisplayPreferences

User objects have DisplayPreferences. For AnyUser values, the DisplayPreferences use a locale inferred from request headers like Accept-Language and/or geo-location. For SomeUser values, the DisplayPreferences come from storage.

Vale should restrict the creation of User objects to a small amount of code that extracts and validates session headers from the HTTP request.

DisplayPreferences

Fields

  • locale: Locale

More complicated preferences would break the equivalence between a page for the user that requires no non-public info and the /pub version.

AccountInfo

A person needs to be able to log in to become the user. AccountInfo is info about a person that is never displayed to any user other than the user corresponding to that person.

These fields are not typed as Guarded, because they are only used for login flow, so one person's account info is never sent to the client when the user is not that person.

  • id: Int64. Same as the person auto-assigned id.
  • email: Pii<String>. This is used during login.
  • providerIdentifier: String. The OAuth provider identifier, e.g. Github.
  • subclaim: A provider specific unique identifier per OAuth conventions.

Vale does not need nor store the name info. It requests it as part of OAuth during the creation flow to auto-populate the user's name field, but does not store it past that point.

A person may change to a different provider in which case, all but the id may change.

Email must never be shared by two account records. Any changes to an account to use a different email and/or provider must be done in a transaction that guards against duplicate emails for different ids. Account recovery flows are TBD but probably require this.

PersonProfile

People have different goals when interacting with Value.

Some want to promote their real life persona's public image as a contributor to OSS. Some want to keep their public and online personas separate. Pragmatically, some want their coworkers to know who they are in the organization. Perhaps inside example.com the internal employee directory who.example.com/alice is a great introduction, but that's not visible outside, and talking about internal websites is discouraged. But their public software work is best introduced via their stackoverflow or github account pages.

In these three scenarios, the first might use the public group permissions a lot. The second might leave most of their profile blank, or restrict it to a group that they and a few friends maintain. The third might have multiple bios and social URLs some with company group visibility and some with public visibility.

Most everything about a PersonProfile is PII.

The fields for a person profile are defined in PrincipalProfile, next.

PrincipalProfile

For the purposes of this document, "principal" means that which acts as a subject in important sentences about entities: "This person creates a library", "That group manages the library." Users are not principals because users can view only a logged-in user, corresponding to a person, can effect change.

Principal profiles collect information about a principal so that users can understand who can do what. Reputation is important in choosing which software building blocks to use: who has the specialist expertise to build useful abstractions, who is skilled in execution, who maintains what they write. Vale needs to let principals manage their online personae to make a case for why their libraries are worth investigating.

PrincipalProfile is a sealed interface with shared fields between PersonProfile and GroupProfile.

This type has a list of some fields that you might expect to see one of just to allow different group associations, but if multiple bios, for example, are visible, display code should just show them in sequence. If multiple avatars are visible, since avatars are meant to occupy a fixed amount of visual real estate, display code should just show the first.

Fields

  • id: Int64; auto-assigned. Not changeable via delta.
  • name: Guarded<Pii<String>>, text/markdown but short markdown. Terms of use require no relationship to a real person, but if the name uses a trademarked phrase without permission, that's a ToU violation.
  • bio: Pii<Guarded<Pii<String>>>, text/markdown; People can describe themselves and their goals here. Groups can describe their purpose here.
  • urls: List<Guarded<Pii<String>>>, URL; to things like Github, stackoverflow, social websites, company directories, personal blogs, etc.
  • avatars: List<Guarded<Pii<String>>>, URL; links to things like gravatar. Terms of Service require no NSFW avatars.
  • favouriteGroups: List<Guarded<GroupId>>, groups the user likes. If public, can advertise affiliation with a high-reputation group. If private, just helps auto-populate options like managedBy when creating libraries. (Visible, favourite groups that the person is not a member of should be distinguished using neutral terminology: "Likes" vs "Affiliations")

URLs will be limited to http or https or mailto. (We may expand to tip jar style custom mobile URIs based on user demand but often there are https equivalents.) There is no terms of service requirement that a mailto appear or that it match the email address on file.

GroupProfile

A group profile only exists for named groups, "g-" groups.

PrincipalProfile allows getting at fields for either a person or a group.

In lieu of a ubiquitous profile object, there is a short Markdown descriptor available for any group.

For the public group, it is the term "everyone" appropriately localized. For an individual, "p-" group, it is the name if visible or otherwise "person#[auto-assigned-number]" with "person" appropriately localized For a "g-" group, it is the group name if visible or otherwise "group#[auto-assigned-number]" with "group" appropriately localized

GroupProfile currently has no fields beyond PrincipalProfile.

Named groups also have GroupMembership which is a separate domain object.

GroupMembership

GroupMembership lists which groups are in a named group.

Semantically, a group has organizers (can edit the group membership and the group profile) and members (cannot). Another group is in the group if it is either an organizer or member or it is transitively in any element of organizers or any element of members.

For shorthand this specification says a person is in a group when that person's corresponding p- group is in the group.

In the database, it's sufficient to represent groups via a table with three columns (contains, contained, role). The first two can be keys. The left column only contains g- group IDs, but the second can contain either g- or p- group IDs. Role is an enum (organizer, member).

It is legal for two named groups to contain each other. To rule otherwise would mean managing race conditions in databases between different group organizers. Code must not iterate members of groups without taking measures to avoid cycles.

Databases can efficiently compute this. AGE extends Postgres to make transitive closure computation efficient via cypher queries.

In the type representation, absent a database, it's convenient to have a bit more structure for a group, to break p- members out since they don't need to be transitively traversed. This would allow computing group membership on the client if really necessary.

Fields: - gOrganizers: List<Int64>, the auto-assigned ID parts of g- IDs. - gMembers: List<Int64>, the auto-assigned ID parts of g- IDs - pOrganizers: Set<PersonId>, the auto-assigned ID parts of p- IDs. - pMembers: Set<PersonId>, the auto-assigned ID parts of p- IDs

Here's an algorithm that is not efficient but which memoizes well, and can be used in test code and to check against a cypher implementation for equivalence. For checking group membership, memoized, takes two group IDs, container and contained, and a memo-table:

  1. if the two group IDs are the same, return true
  2. if the contained is the special group public, return false
  3. if the container is the special group public, return true
  4. if the container is a p- ID return false because it only contains itself and failed the check above
  5. if memoTable[pair(container, contained)] is present, return it
  6. let containerId: Int64 be the container's auto-assigned id part because it must be a g- group
  7. let containedId: Int64 be the contained's auto-assigned id part
  8. let isP: Boolean be true when contained is a p- group. Otherwise it's a g- group because we eliminated the special group public above.
  9. set memoTable[pair(container, contained)] = false. Guarantees halting even in the presence of group membership cycles.
  10. let result = false
  11. if isP and containedId is in either pOrganizers or pMembers, set result to true
  12. else if !isP and containedId is in either gOrganizers or gMembers, set result to true
  13. else for each element of the union of gOrganizers and gMembers
  14. recursively check group membership of container and element with the same memo-table
  15. if it is a member, set result to true and break from the loop
  16. set memoTable[pair(container, contained)] = result
  17. return result

For efficiency and ease of use, UserContext contains a GroupMembership helper object that either uses storage to check or caches group membership operations so that repeated questions about group membership can be efficiently answered.

LibraryProfile

A library profile has information about a library.

Access to the library as a whole is gated by visibleTo in LibraryManagement. This allows restricting to an organizations member group (or subgroup within an org) or a group of informal collaborators.

This information is not marked Pii because a library is not about a person. A library author might name their library something like "Sam's awesome widget juggling library" but as long as that is an intentional, personal choice to declassify, Vale need not treat library metadata as Pii.

Fields:

  • id: Int64 auto-assigned
  • name: String; text/plain, unique
  • versions: List<SemVer>
  • latestVersionProfile: VersionProfile?
  • tags: TagSet
  • resources: Resources?, a bundle of URLs with a ResourceClassification: docs, issue tracker, git repo, etc.
  • created: DateTime

The managedBy field should not be part storage's libraryprofile table. See the …/ls URL above. That is easier to fetch if there is a dedicated, double-keyed, many:1 libraryManagement table (see below).

Only if the current user is in the managedBy group are these additional details, from a separate management storage tables, available:

testimonialModeration: one of Allow, Queue, Forbid; controls which testimonials show up on a library page instead of only on the author's testimonials page

The latestVersionProfile is not stored in the database with the other fields, nor is it editable. It is computed, but bundled by the storage layer in the domain object since the latestVersionProfile determines most of the visible content on the library page.

The latestVersionProfile is null if no non-pre-release version has been published.

See also (library creation workflow below)

LibraryManagement

This contains info necessary to enable changes to the library but also permissions info.

  • id: Int64, auto-assigned id for the library
  • managedBy: GroupId, who can change the library
  • visibleTo: GroupId, who can see that the library exists, its profile data, and can download artifacts.

If a group manages a library and the group is deleted, the managedBy group ID may not refer to an existing group. TBD: orphaned library workflow that encourages OSS contributors to take over abandoned libraries.

LibraryAutomation

This is editable info related to a library that is controlled by the managedBy group.

It's for secrets that allow authorized third-parties like The ValuePublish action to publish versions without repeated authorization by library managers.

The Vale UI must never expose the secrets to any user, not even managers. Managers may store, but not retrieve. Managers should create a secret, store it securely with the third-party (e.g. a Github action), and store it here. The action can then use that secret to attest to its authorization to publish a new library version.

Fields:

  • id: Int64, auto-assigned id for the library
  • secret: bytes: a cryptographic secret
  • description: String, text/plain; allows managers to know what they stored so they can delete keys that are no longer needed
  • created: DateTime; when the secret was stored so that managers can perform key rotation according to their needs

This is a one:many mapping.

Vale's web UI should not have read access to the database column owning secrets. A separate publication system is the only one that needs them, and the database could be imlemented using an identifier for the secret which points into a separate storage area.

VersionProfile

Vale's unit of publication is a library version, uniquely identified by a pair of (library name, semver tag).

Access control to a version profile is restricted based on the visibility of the owning library.

As discussed above, published versions provide mappings of locale to documentation essential for connecting library providers and library users.

Like library fields, version fields are not Pii. Documentation files might have author names in the front matter and/or in the prose, but see explanation of intentional declassification above.

Fields:

  • version: SemVer
  • docs: Map<Locale, String>, Markdown
  • published: DateTime

Testimonial

A testimonial represents a person's opinion about a library.

Library managers get to choose which to show up, so they can elect to only show positive ones.

Fields:

  • id: Int64, auto-assigned
  • author: PersonId
  • library: LibraryName
  • opinion: Markdown
  • authored: DateTime
  • visibleTo: GroupId

There is intentionally no star-rating. Vale will not attempt to aggregate what should be thoughtful and nuanced critiques into a star rating.

A person can choose to make their testimonials visible only within an org: "This library is fine, but we at example.com have invested effort in deep integrations with XYZ so please use that instead." That's enabled by visibleTo.

If none of a library's management group have access to a testimonial, then it will not show up in their mod queue.

TestimonialBundle

A testimonial bundle is just a group of testimonials. Bundles supports paging.

Fields: - testimonials: List<Testimonial> - page: Int32, for paging, zero-index - pages: Int32, count of pages. May be zero if testimonials is empty

A testimonial bundle can be all testimonials related to a library, or from an author.

Workflows

Workflows are sequences of actions that together aim towards a goal.

We list each workflow, mention the URLs and entities involved.

Most of the access control notes above are devoted to read access. Here we try to make explicit what is writable under what conditions.

The only workflows available to non-logged in users are, unsurprisingly, account creation and login.

Account creation flow

Vale does not wish to store passwords. It will store automation tokens on behalf of users to allow single button multi-publication.

Not storing passwords for many users lets us focus more on really locking down the latter.

To that end, all login is via OAuth federated IDs. It is a goal of Vale to only require access to the email address as a unique identifier. There must not be two people with the same email on file, but a person may change their federation settings and the email as long as doing so does not introduce an email uniqueness conflict with another person.

The goal of this flow are to:

  1. create a valid AccountInfo record in storage by requesting attestation from OAuth
  2. create a PersonProfile populated with the user's name, but initially visible only to the newly created person
  3. flow into the login flow, replacing session cookies with ones that identify the current user as corresponding to the newly created Person
  4. provide options to quick import profile info from sources like Stackoverflow, Github, Gitlab, vcards.
  5. redirect to the profile edit page so the person can fill in their profile before optionally making it visible to someone other than themselves. The UI should provide prominent warnings and reminders that everything is private at this stage. The UI should require opt-in to leaving the initial profile page while their name is private though that is a valid choice.

Login and Logout flow

The login flow:

  1. ensures that an HTTP session identifier is available
  2. invokes OAuth to resolve a claim
  3. if unsuccessful, presents the option to create an account or retry (TBD relationship to account recovery flow)
  4. else if successful, replaces session identifiers to point to the logged in user
  5. if the user logged in so as to see a URL that they could not access as public, redirect to that page

The logout flow:

  1. erases any info associating the session ID with a person
  2. redirects to / to re-infer a preferred locale from non-person associated info and leave the user at the home page where they can elect to log-in again, perhaps as a different person, or browse public info.

Neither flow involves Vale access control. Login requires access control mediated entirely by the OAuth provider. The OAuth claim is sufficient to authorize creation of a User object based on a matching AccountInfo record.

Library creation flow

User must be logged in.

The flow faciliates the following steps:

  1. Choose a name. Name must not conflict with existing names.
  2. Create a LibraryProfile record where managedBy and visibleTo are initially the p- group associated with the current user's person.
  3. Proceed to the profile edit page in a way that encourages checking managedBy and visibleTo. Drop downs should make it easy to pick favourite groups for these.
  4. Profile edit page should encourage auto-importing URLs from a Git hosting site or equivalent.
  5. Whether the profile is edited, or the user navigates away without editing, pop-up to encourage going to the versions list page for the library which, presuming it's empty will have instructions on how to publish, and which should encourage using the ValePublish action for their source hosting provider.

Publication flow

User must be logged in and in both the managedBy and visibleTo group for the library.

If the user is not in the visibleTo group, they are not authorized to view the version list, so picking a non-conflicting semver version identifier would be leaking information about the library. If they are in managedBy they can change visibleTo to something that includes them, but that is a required step to avoid leakage or blurring the meaning of the two different access controls.

Normally, publication should be initiated by the ValePublish action (see above). For Github, a Github action by that name handles the

Manual publication via the Vale web UI involves several steps:

  1. Identify the library
  2. Check that the current user is authorized
  3. Pick a semver tag and checking that it does not conflict with an existing version of the library
  4. Pick a source for the version. Possible options include:
    1. A Git repo that is cloneable
    2. A TAR ball
  5. Retrieve a file tre from the source
  6. Check that it contains a config.temper.md for the named library and a temper.out and temper.keep directories corresponding to the library with desired translations
  7. Forward source metadata, the bundle, and the desired library name and version pair to the publication system.
  8. Display a waiting page until notification of success or failure is received.
  9. On failure, point to help docs.
  10. On success, transition to the /l/[libraryname]/v/[semver] page.

Moderation flow

An authorized library manager may enter the moderation flow.

If the moderation options are not to queue, the page displays that fact and links to library profile editing to enable changing that.

They are presented with three tabs: Queue, Accepted, Rejected. Initially the Queue is selected.

The Queue tab includes Accept All and Reject All buttons.

Regardless of which tab is chosen, there is a vertical listing of the current page of the testimonial bundle for testimonials of the library whose status is the status the tab displays.

Next to each testimonial are radio buttons. For the queue tab the radio buttons are: no action, accept, reject. For the accepted tab, the radio buttons are: no action, reject, back to queue. For the rejected tab, the radio buttons are: no action, accept, back to queue. All radio button text must be properly localized. The underlying values are the empty string for no-action, or the programmatic identifier for the corresponding testimonial status.

At the top and bottom are apply buttons. If a radio button was changed from "no action" but not applied, navigation away alerts to that fact.

Apply changes the status of the relevant buttons in the database via the server, and refreshes so that the current page contains only the ones for the relevant tab.

Paging through testimonials in the testimonial bundle is accomplished by standard next/prev buttons.

There should be a convenient link to view the library page, so the moderator can see it with testimonials.

Messages from the browser to the server include a delta to the testimonial bundle, but the client should keep a cache of statuses and the server should respond with a delta for any it changed that the client incorporates into the cache. The cache can be filled with extra page's content as requested. When switching tabs, the client just needs to check that it has a page with that status, requesting as necessary and rerender. No need to load a different browser page when tabbing, paging, or applying.

On a library page, testimonial authors should always see their testimonials, but if the are queued or rejected, that decision should be made clear.

Rejected or Queued testimonials by other authors should never by visible on a library page, but if the current user is a manager of a library, on its library page they should see a discrete note when the queue is not empty.