diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4c494e85..b3ac358a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,5 +1,7 @@ Hurray! We are glad that you want to contribute to our project! 👍 +If this is your first contribution, not to worry! We have a great [tutorial](https://www.youtube.com/watch?v=bgSDcTyysRc) to help you get started, and you can always ask us for help in the `#athens` channel in the [gopher slack](https://gophers.slack.com/messages/C9LRAQN8N). We'll give you whatever guidance you need. + ## Verify your work Run `make verify` to run all the same validations that our CI process runs, such as checking that the standard go formatting is applied, linting, etc. diff --git a/README.md b/README.md index 88f71ecd..ae1d3021 100644 --- a/README.md +++ b/README.md @@ -77,6 +77,13 @@ If you're not ready to contribute code yet, there are plenty of other great ways The Athens project would not be possible without the amazing projects it builds on. Please see [SHOULDERS.md](./SHOULDERS.md) to see a list of them. +# Coding Guidlines + +We all strive to write nice and readable code which can be understood by every person of the team. To achive that we follow principles described in Brians talk `Code like the Go team`. + +- [Printed version]("https://learn-golang.com/en/goteam/") +- [Gophercon RU talk]("https://www.youtube.com/watch?v=MzTcsI6tn-0") + # Code of Conduct This project follows the [Contributor Covenant](https://www.contributor-covenant.org/) (English version [here](https://www.contributor-covenant.org/version/1/4/code-of-conduct)) code of conduct. diff --git a/docs/content/intro/_index.md b/docs/content/_index.md similarity index 78% rename from docs/content/intro/_index.md rename to docs/content/_index.md index e843e0d6..f6ef5f8c 100644 --- a/docs/content/intro/_index.md +++ b/docs/content/_index.md @@ -3,13 +3,8 @@ title: "Intro" date: 2018-02-11T16:52:23-05:00 --- +![Athens logo](/banner.png) ## Welcome to Athens Athens is the name of the combined project that includes a global registry for Go modules and a stand-alone proxy server that can be deployed on-premise to cache and control available Go modules for your organization. - -Helpful Links - -* Global Registry [Olympus]("/olympus") -* Proxy Server -* Github \ No newline at end of file diff --git a/docs/content/design/communication.md b/docs/content/design/communication.md new file mode 100644 index 00000000..badfe774 --- /dev/null +++ b/docs/content/design/communication.md @@ -0,0 +1,63 @@ +--- +title: "Communication" +date: 2018-02-11T15:57:56-05:00 +--- + +## Communication flow + +This is the story of a time long ago. A time of myth and legend when the ancient Gods were petty and cruel and they plagued build with irreproducibility. +Only one project dared to challenge their power...Athens. Athens possessed a strength the world had never seen. Filling its cache. + +### Clean plate + +At the beginning, there's theoretical state when a cache of proxy and registry is empty. + +When User makes a request at this ancient time, it works as described on the flow below. + +- User contacts proxy asking for module M, version v1.0 +- Proxy checks whether or not it has this module in its storage. It does not. +-So it responds with a redirect to Olympus and schedules a background job to fetch the module from Olympus. +- User receives redirect to Olympus and asks it for module M, version v1.0 +- Olympus doesn't have the module in its cache as well, so it asks the underlying VCS (e.g github.com) for a module. +- After it receives all the bits, it stores it into its own cache and serves it to the User. +- User receives module and is happy. +- Concurrently the proxy asks Olympus for module M, version v1.0 as well. Olympus, now aware of this module, serves it so proxy can fill its own cache. + +![Communication flow for clear state](/athens-clear-scenario.png) + +### New proxy joins the party + +At this point, we have 1 proxy and 1 registry, each of them aware about module M. Now new proxy joins with an empty cache. + +We can see the flow is very similar. + +- User contacts new proxy, which checks internal storage to find out it is missing the module. + - Redirects to Olympus, + - Schedules new cache fill job. +- User contacts Olympus aware of the module and receives the response right away. +- The proxy, after some time, contacts Olympus and fills its cache. + +![Communication flow for new proxy](/athens-new-proxy-old-olympus-scenario.png) + + +### Happy path + +Now that all proxies and olympus are aware of module M at version v1.0, they can all serve that module immediately to the user, without redirecting or fetching it from the VCS. + +![Communication flow for new proxy](/athens-proxy-filled.png) + + +### Private Code + +There are times when you do not want the mighty gods of Olympus to know about your desires. E.g: +- You are requesting a private module, +- Communication between the proxy and Olympus is disabled. + +In this case +- User contacts proxy asking for a private module. +- Proxy detects this repo is private and checks its storage. It does not find it there. +- Proxy contacts internal VCS directly. +- VCS responds with a module which is then stored in a cache Synchronously. +- The module is served to the User. + +![Communication flow for new proxy](/athens-private-repo-scenario.png) diff --git a/docs/content/design/proxy.md b/docs/content/design/proxy.md new file mode 100644 index 00000000..c23aaae6 --- /dev/null +++ b/docs/content/design/proxy.md @@ -0,0 +1,73 @@ +--- +title: "Proxy" +date: 2018-02-11T15:59:56-05:00 +--- + +## The Athens Proxy + +The Athens project has two components, the [central registry](./REGISTRY.md) and edge proxies. +This document details the latter. + +## The Role of the Proxy + +We intend proxies to be deployed primarily inside of enterprises to: + +- Host private modules +- Exclude access to public modules +- Cache public modules + +Importantly, a proxy is not intended to be a complete _mirror_ of an upstream registry. For public modules, its role is to cache and provide access control. + +## Proxy Details + +First and foremost, a proxy exposes the same vgo download protocol as the registry. Since it doesn't have the multi-cloud requirements as the registry does, it supports simpler backend data storage mechanisms. We plan to release a proxy with several backends including: + +- In-memory +- Disk +- RDBMS +- Cloud blob storage + +Users who want to target a proxy configure their `vgo` CLI to point to the proxy, and then execute commands as normal. + +## Cache Misses + +When a user requests a module `MxV1` from a proxy and the proxy doesn't have `MxV1` in its cache, it first determines whether `MxV1` is private or not private. + +If it's private, it immediately does a cache fill operation from the internal VCS. + +If it's not private, the proxy consults its exclude list for non-private modules (see below). If `MxV1` is on the exclude list, the proxy returns 404 and does nothing else. If `MxV1` is not on the exclude list, the proxy executes the following algorithm: + +``` +registryDetails := lookupOnRegistry(MxV1) +if registryDetails == nil { + return 404 // if the registry doesn't have the thing, just bail out +} +return registryDetails.baseURL +``` + +The important part of this algorithm is `lookupOnRegistry`. That function queries an endpoint on the registry that either: + +- Returns 404 if it has `MxV1` in the registry +- Returns the base URL for MxV1 if it has `MxV1` in the registry + +Finally, if `MxV1` is fetched from a registry server, a background job will be created to periodically check `MxV1` for deletions and/or deprecations. In the event that one happens, the proxy will delete it from the cache. + +_In a later version of the project, we may implement an event stream on the registry that the proxy can subscribe to and listen for deletions/deprecations on modules that it cares about_ + +## Exclude Lists and Private Module Filters + +To accommodate private (i.e. enterprise) deployments, the proxy maintains two important access control mechanisms: + +- Private module filters +- Exclude lists for public modules + +### Private Module Filters + +Private module filters are string globs that tell the proxy what is a private module. For example, the string `github.internal.com/**` tells the proxy: + +- To never make requests to the public internet (i.e. to the registry) regarding this module +- To download module code (in its cache filling mechanism) from the VCS at `github.internal.com` + +### Exclude Lists for Public Modules + +Exclude lists for public modules are also globs that tell the proxy what modules it should never download from the registry. For example, the string `github.com/arschles/**` tells the proxy to always return `404 Not Found` to clients. \ No newline at end of file diff --git a/docs/content/design/registry.md b/docs/content/design/registry.md new file mode 100644 index 00000000..b918b517 --- /dev/null +++ b/docs/content/design/registry.md @@ -0,0 +1,213 @@ +--- +title: "Registry" +date: 2018-02-11T15:58:56-05:00 +--- + +## The Athens Registry + +The Athens registry is a Go package registry service that is hosted globally across multiple cloud providers. The **global deployment** will have a DNS name (i.e. `registry.golang.org`) that round-robins across each **cloud deployment**. We will use the following **cloud deployments** for _example only_ in this document: + +- Microsoft Azure (hosted at `microsoft.registry.golang.org`) +- Google Cloud (hosted at `google.registry.golang.org`) +- Amazon AWS (hosted at `amazon.registry.golang.org`) + +Regardless of which **cloud deployment** is routed to, the **global deployment** must provide up-to-date (precise definition below) module metadata & code. + +We intend to create a foundation (the TBD foundation) that manages **global deployment** logistics and governs how each **cloud deployment** participates. + +## Glossary + +In this document, we will use the following keywords and symbols: + +- `OA` - the registry **cloud deployment** hosted on Amazon AWS +- `OG` - the registry **cloud deployment** hosted on Google Cloud +- `OM` - the registry **cloud deployment** hosted on Microsoft Azure +- `MxVy` - the module `x` at version `y` + +## Properties of the Registry + +The registry should obey the following invariants: + +- No existing module or version should ever be deleted or modified + - Except for exceptional cases, like a DMCA takedown (more below) +- Module metadata & code may be eventually consistent across **cloud deployments** + +These properties are both important to design the **global deployment** and to ensure repeatable builds in the Go community as much as is possible. + +## Technical Challenges + +A registry **cloud deployment** has two major concerns: + +- Sharing module metadata & code +- Staying current with what other registry **cloud deployment**s are available + +For the rest of this document, we’ll refer to these concerns as **data exchange** and **membership**, respectively. + +Registries will use separate protocols to do **data exchange** and **membership**. + +## Data Exchange + +The overall design of the **global deployment** should ensure the following: + +- Module metadata and code is fetched from the appropriate source (i.e. a VCS) +- Module metadata and code is replicated across all **cloud deployment**s. As previously stated, replication may be eventually consistent. + +Each **cloud deployment** holds: + +- A module metadata database +- A log of actions it has taken on the database (used to version the module database) +- Actual module source code and metadata + - This is what vgo requests + - Likely stored in a CDN + +The module database holds metadata and code for all modules that the cloud deployment is aware of, and the log records all the operations the cloud deployment has done in its lifetime. + +## The Module Database + +The module database is made up of two components: + +- A blob storage system (usually a CDN) that holds module metadata and source code + - This is called the module CDN +- A key/value store that indicates whether and where a module MxV1 exists in the **cloud deployment**'s blob storage + - This is called the module metadata database, or key/value storage + +If a **cloud deployment** OM holds modules `MxV1`, `MxV2` and `MyV1`, its module metadata database would look like the following: + +``` +Mx: {baseLocation: mycdn.com/Mx} +My: {baseLocation: mycdn.com/My} +``` + +Note that `baseLocation` is intended for use in the `` redirect response passed to vgo. As a result, it may point to other **cloud deployment** blob storage systems. More information on that in the synching sections below. + +## The Log + +The log is an append-only record of actions that a **cloud deployment** OM has taken on its module database. The log exists only to facilitate module replication between **cloud deployment**s (more on how replication below). + +Below is an example event log: + +``` +ADD MxV1 ID1 +ADD MxV2 ID2 +ADD MyV1.5 ID3 +``` + +This log corresponds to a database that looks like the following: + +``` +Mx: {baseLocation: mycdn.com/Mx} +My: {baseLocation: mycdn.com/My} +``` + +And blob storage that holds versions 1 and 2 of Mx and version 1.5 of My. + +### Log IDs + +Note that each event log line holds ID data (`ID1`, `ID2`, etc...). These IDs are used to by other **cloud deployment**s as database versions. Details on how these IDs are used are below in the pull sync section. + +## Cache Misses + +If an individual **cloud deployment** OM gets a request for a module MxV1 that is not in its database, it returns a "not found" (i.e. HTTP 404) response to vgo. Then, the following happens: + +- OM starts a background cache fill operation to look for MxV1 on OA and OG + - If OA and OG both report a miss, OM does a cache fill operation from the VCS and does a push synchronization (see below) +- vgo downloads code directly from the VCS on the client's machine + +## Pull Sync + +Each **cloud deployment** will actively sync its database with the others. Every timer tick `T a **cloud deployment** OM will query another **cloud deployment** OA for all the modules that changed or were added since the last time OM synched with OA. + +### Query Mechanism + +The query obviously relies on OA being able to provide deltas of its database over logical time. Logical time is communicated between OM and OA with log IDs (described above). The query algorithm is approximately: + +``` +lastID := getLastQueriedID(OA) +newDB, newID := query(OA, lastID) // get the new operations that happened on OA's database since lastID +mergeDB(newDB) // merge newDB into my own DB +storeLastQueriedID(OA, newID) // after this, getLastQueriedID(OA) will return newID +``` + +The two most important parts of this algorithm are the `newDB` response and the `mergeDB` function. + +#### Database Diffs + +OA uses its database log to construct a database diff starting from the `lastID` value that it receives from OM. It then sends the diff to OM in JSON that looks like the following: + +```json +{ + "added": ["MxV1", "MxV2", "MyV1"], + "deleted": ["MaB1", "MbV2"], + "deprecated": ["MdB1"] +} +``` + +Explicitly, this structure indicates that: + +- `MxV1`, `MxV2` and `MyV1` were added since `lastID` +- `MaB1` and `MbV2` were deleted since `lastID` +- `MdB1` was deprecated since `lastID` + +#### Database Merging + +The `mergeDB` algorithm above receives a database diff and merges the new entries into its own database. It follows a few rules: + +- Deletes insert a tombstone into the database +- If a module `MdV1` is tombstoned, all future operations that come via database diffs are sent to `/dev/null` +- If module `MdV2` is deprecated, future add or deprecation diffs for `MdV2` are sent to `/dev/null`. Future delete operations can still tombstone + +The approximate algorithm for `mergeDB` is this: + +``` +func mergeDB(newDB) { + for added in newDB.added { + fromDB := lookup(added) + if fromDB != nil { + break // the module already exists (it may be deprecated or tombstoned), bail out + } + addToDB(added) // this adds the module to the module db's key/value store, but points baseLocation to the other cloud deployment's blob storage + go downloadCode(added) // this downloads the module to local blob storage, then updates the key/value store's baseLocation accordingly + } + for deprecated in newDB.deprecated { + fromDB := lookup(deprecated) + if fromDB.deleted() { + break // can't deprecated something that's already deleted + } + deprecateInDB(deprecated) // importantly, this function inserts a deprecation record into the DB even if the module wasn't already present! + } + for deleted in newDB.deleted { + deleteInDB(deleted) // importantly, this function inserts a tombstone into the DB even if the module wasn't already present! + } +} + +``` + +## Push Sync + +If a **cloud deployment** OM has a cache miss on a module MxV1, does a cache fill operation and discovers that no other **cloud deployment** OG or OA have MxV1, it fills from the VCS. After it finishes the fill operation, it saves the module code and metadata to its module database and adds a log entry for it. The algorithm look like the following: + +``` +newCode := fillFromVCS(MxV1) +storeInDB(newCode) +storeInLog(newCode) +pushTo(OA, newCode) // retry and give up after N failures +pushTo(OG, newCode) // retry and give up after N failures +``` + +The `pushTo` function is most important in this algorithm. It _only_ sends the existence of a new module, but no event log metadata (i.e. `lastID`): + +``` +func pushTo(OA, newCode) { + http.POST(OA, newCode.moduleName, newCode.moduleVersion, "https://OM.com/fetch") +} +``` + +The endpoint in OA that receives the HTTP `POST` request in turn does the following: + +``` +func receive(moduleName, moduleVersion, fetchURL) { + addToDB(moduleName, moduleVersion, OM) // stores moduleName and moduleVersion in the key/value store, with baseLocation pointing to OM + go downloadCode(added) // this downloads the module to local blob storage, then updates the key/value store's baseLocation accordingly +``` + +Note again that `lastID` is not sent. Future pull syncs that OA does from OM will receive moduleName/moduleVersion in the 'added' section, and OA will properly do nothing because it already has moduleName/moduleVersion. \ No newline at end of file diff --git a/docs/content/intro/components.md b/docs/content/intro/components.md new file mode 100644 index 00000000..a0026f02 --- /dev/null +++ b/docs/content/intro/components.md @@ -0,0 +1,30 @@ +--- +title: "Components" +date: 2018-02-11T16:57:56-05:00 +--- + +From a very high-level view, we recognize 4 major components of the system. + +### Client + +The client is a user, powered by go binary with module support. At the moment of writing this document, it is `go1.11beta3` + +### VCS + +VCS is an external source of data for Athens. Athens scans various VCSs such as `github.com` and fetches sources from there. + +### Proxy - Athens + +We intend proxies to be deployed primarily inside of enterprises to: + +* Host private modules +* Exclude access to public modules +* Cache public modules + +Importantly, a proxy is not intended to be a complete mirror of an upstream registry. For public modules, its role is to cache and provide access control. + +### Registry - Olympus + +The Athens registry is a Go package registry service that is hosted globally across multiple cloud providers. The global deployment will have a DNS name (i.e. registry.golang.org) that round-robins across each cloud deployment. + +The role of Olympus is to provide up-to-date module metadata & code. \ No newline at end of file diff --git a/docs/content/intro/first-content.md b/docs/content/intro/first-content.md index b55457ae..c649c342 100644 --- a/docs/content/intro/first-content.md +++ b/docs/content/intro/first-content.md @@ -1,5 +1,12 @@ --- -title: "First Content" -date: 2018-02-11T16:52:56-05:00 +title: "Athens 101" +date: 2018-02-11T16:59:56-05:00 --- - First Post! + +## What is Athens? + +Shortly: Athens is a project building on top of vgo (or go1.11+) trying to bring dependencies closer to you so you can count on repeatable builds even at a time when VCS is down. + +The big goal of Athens is to provide a new place where dependencies — not code — live. Dependencies are immutable blobs of code and associated metadata that come from Github. They live in storage that Athens controls. + +You probably already know what “immutable” means, but let me just point it out again because it’s really important for this whole system. When folks change their packages, iterate, experiment, or whatever else, code on Athens won’t change. If the package author releases a new version, Athens will pull that down and it’ll show up. So if you depend on package M version v1.2.3, it will never change on Athens. _Not even after force push, not even after repo cease to exist_. \ No newline at end of file diff --git a/docs/content/intro/protocol.md b/docs/content/intro/protocol.md new file mode 100644 index 00000000..bb71d7b1 --- /dev/null +++ b/docs/content/intro/protocol.md @@ -0,0 +1,73 @@ +--- +title: "Download protocol" +date: 2018-02-11T16:58:56-05:00 +--- + +Athens builds on top of Go CLI which specifies a set of endpoints with which it communicates with external proxies providing modules. This set of endpoints we call _Download Protocol_ + +The original vgo research paper on Download protocol can be found here: https://research.swtch.com/vgo-module + +Each of these endpoints sits on top of a module. Let's assume module `htp` authored by `acidburn`. + +So for each of endpoints mentioned bellow we will assume address `acidburn/htp/@v/{endpoint}` (e.g `acidburn/htp/@v/list`) + +## List of versions + +This endpoint returns a list of versions that Athens knows about for `acidburn/htp`. The list is just separated by newlines: + +```HTTP +GET athens.io/acidburn/htp/@v/list +``` + +```HTML +v1.0 +v1.1 +v2.0 +``` + +## Version info + + +```HTTP +GET athens.io/acidburn/htp/@v/v1.0.0.info +``` + +This returns JSON with information about v1.0.0. It looks like this: + +```json +{ + "Name": "v1.0.0", + "Short": "v1.0.0", + "Version": "v1.0.0", + "Time": "1972-07-18T12:34:56Z" +} +``` + +## Go.mod file + +```HTTP +GET athens.io/acidburn/htp/@v/v1.0.0.mod +``` + +This returns the go.mod file for version v1.0.0. If athens.io/acidburn/htp version `v1.0.0` has no dependencies, the response body would look like this: + +``` +module "athens.io/acidburn/htp" +``` + +## Module sources + +```HTTP +GET athens.io/acidburn/htp/@v/v1.0.0.zip +``` + +This is what it sounds like — it sends back a zip file with the source code for the module in version v1.0.0. + +## Latest + +```HTTP +GET athens.io/acidburn/htp/@latest +``` + +This endpoint returns the latest version of the module. +If the version does not exist it should retrieve the hash of latest commit. diff --git a/docs/static/athens-clear-scenario.png b/docs/static/athens-clear-scenario.png new file mode 100644 index 00000000..022e67d9 Binary files /dev/null and b/docs/static/athens-clear-scenario.png differ diff --git a/docs/static/athens-new-proxy-old-olympus-scenario.png b/docs/static/athens-new-proxy-old-olympus-scenario.png new file mode 100644 index 00000000..23c29a21 Binary files /dev/null and b/docs/static/athens-new-proxy-old-olympus-scenario.png differ diff --git a/docs/static/athens-private-repo-scenario.png b/docs/static/athens-private-repo-scenario.png new file mode 100644 index 00000000..b989b375 Binary files /dev/null and b/docs/static/athens-private-repo-scenario.png differ diff --git a/docs/static/athens-proxy-filled.png b/docs/static/athens-proxy-filled.png new file mode 100644 index 00000000..25dc5c41 Binary files /dev/null and b/docs/static/athens-proxy-filled.png differ