Globus and Amazon S3
Amazon S3
All AWS regions—except for regions in China—are accessible through one Globus collection:
If you are working with requester-pays buckets, we have a separate Globus collection you can use:
Before continuing, you may need to do some one-time setup work. You will need to…
-
Review the service limitations, to see if they will affect your usage. If you are still OK using Globus, your next step will be to…
-
Create an IAM User, with an access key (officially an “Access Key ID”) and a secret key (officially a “Secret Access Key”). Each person may only have one IAM User associated with Globus. Once your IAM User is created, you will need to…
-
Grant the IAM User access to buckets. For S3 buckets in other accounts, this will also give your IAM User permission to reach out to those other accounts. Finally, you need to…
-
Load the IAM User credentials into Globus. This gives Globus access to S3, with the credentials you provide.
Once the one-time setup work is complete, you should proceed to access your files on S3.
If you will be accessing S3 buckets in other accounts, those account owners will also need to grant access to your IAM User.
Each step is described in detail below.
Service Limitations
Globus has a number of limitations when working with Amazon S3. These limits do not affect most common use cases, but they might affect you, so you should review them before starting to use Globus with Amazon S3.
Globus for Amazon S3 does not support the following S3 features:
-
Custom Metadata / Tags: Custom Metadata and tags on existing objects are ignored when those objects are downloaded, and new objects do not have custom metadata or tags set.
-
Versions: When downloading a file from an S3 bucket, Globus will always access the latest version.
-
ACLs: ACLs on existing objects (and the bucket) will influence what you can download through Globus, but those ACLs are not copied out of S3. When uploading new objects, ACLs are not explicitly set, and so inherit any bucket-level ACL that is set.
-
Additional Checksum Algorithms: At this time, if an object in S3 needs to be verified, Globus will re-download it in order to compute the checksum.
-
Storage classes that are not S3 Standard: Globus will always upload objects as Standard S3. Globus can download objects that are stored with any storage class, with one exception: Objects stored in Glacier must be retrieved before they may be downloaded.
These limitations are present for two reasons:
-
Globus supports only a common set of features between storage platforms, to make file transfers as portable as possible.
-
The Globus-for-S3 add-on supports other platforms which speak S3. Not all S3-speaking products support Amazon S3 features.
For users who want to upload data into a different storage class (that is not S3 Standard), we suggest using AWS Lambda with Amazon S3, so that as soon as an object is uploaded to a bucket, the Lambda function triggers and sets the correct storage class.
If you are OK with the limitations above, you should move on to creating an IAM User, which Globus will need to interact with S3.
IAM User Configuration
Globus for S3 requires an IAM User to interact with Amazon S3. Globus only allows each person to add one IAM User, so if you need to access buckets in other accounts, you will be configuring cross-account access. This section will explain how to set up your IAM User for access to local buckets and for cross-account access.
This section assumes the fictitious environment above:
-
AWS Account 123456789 has two buckets, one for Leland Stanford Senior and one for Leland Stanford Junior.
-
AWS Account 312665112 has a bucket containing files related to the Central Pacific Railroad.
-
AWS Account 650121554 has a bucket containing files related to the newly-founded Leland Stanford Junior University.
-
In AWS Account 123456789, an IAM User for Leland Senior—meant specifically for Globus use—has access to his own bucket in the local AWS account, as well as the buckets in the Railroad and University AWS accounts.
Creating an IAM User
Before granting access, you must first create an IAM User. Within the AWS Console, navigate to the IAM section, and click on Users. Then click on the “Add users” button.
Give your user a name, and select the “Access key - Programmatic access” box. Click “Next: Permissions”.
For now, we are not setting any permissions. Click “Next: Tags”.
It is helpful to set a tag indicating who is using this IAM User. Set a tag with key “user” and a value of your SUNetID. Click “Next: Review” and then “Create user”.
AWS creates the user and creates a new Access key. Copy the Access key ID and Secret access key.
WARNING: The Secret access key is very sensitive. Keep it in a safe place, and delete your local copy as soon as you have entered it into the Globus web site.
You have now created an IAM User! You can now proceed to give it permissions.
Assigning Local Permissions
Now your IAM User is created, you need to grant it permission to access buckets. In addition to local buckets (which live in the same AWS account as your IAM User), you must also grant access to buckets in other AWS accounts.
Within the AWS Console, navigate to the IAM section, and click on Users.
Locate your IAM User and click on its name.
On the page which appears, click on “Add inline policy”.
The Visual Editor will open.
The Visual Editor supports granting multiple permissions in a single policy. The red boxes show where you can define a single permission and add additional permissions. Each permission has four parts:
-
The service which it applies to.
-
The specific actions that are allowed.
-
The resources where the permission applies.
-
Optional conditions which must be met in order for the permission to be
You need to define three permissions. All three permissions will be for the “S3” service, and no optional conditions will be applied.
Global Permissions
First, you need to allow the ListMyBuckets
and GetBucketLocation
actions, on all resources. Globus uses the ListMyBuckets
permission to
populate a list of buckets available in the local AWS account; it uses the
GetBucketLocation
permission to identify the AWS region for a bucket.
Bucket-level Permissions
Next, you need to allow the ListBucket
and ListBucketMultipartUploads
actions. ListBucket
allows Globus to get a list of object in a bucket.
ListBucketMultipartUploads
is used by Globus when uploading objects to a
bucket.
You should limit this permission to only the specific buckets that
need to be accessed through Globus. Amazon identifies resources by ‘ARN’. The
ARN format for an S3 bucket is arn:aws:s3:::BUCKET_NAME
. You need to include
the ARN of every bucket that Globus will be allowed to access.
Object-level Permissions
Finally, you need to allow all of the permissions needed for uploading, downloading, and deleting:
-
To allow downloading files from S3, you need to allow the
GetObject
action. -
To allow uploading files to S3, you need to allow the
PutObject
,ListMultipartUploadParts
, andAbortMultipartUpload
actions. -
To allow deleting files from S3, you need to allow the
DeleteObject
action.
You should limit this permission to all of the objects in the specific buckets
that need to be accessed through Globus. The ARN format for S3 objects is
arn:aws:s3:::BUCKET_NAME/*
. You need to include the ARN of every bucket that
Globus will be allowed to access. You can limit the permission to specific
objects or prefixes, but this is not recommended except in advanced use cases.
Local Permissions Summary
In the end, you should have three permissions, matching the pictures above. If you look at the policy in the JSON editor, it should look similar to this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllBuckets",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": "*"
},
{
"Sid": "Bucket",
"Effect": "Allow",
"Action": [
"s3:ListBucketMultipartUploads",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::leland_sr_files",
"arn:aws:s3:::cp_rr_files",
"arn:aws:s3:::su_board_files"
]
},
{
"Sid": "Objects",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::leland_sr_files/*",
"arn:aws:s3:::cp_rr_files/*",
"arn:aws:s3:::su_board_files/*"
]
}
]
}
Review the policy, give it a name, and save. The policy will take effect within a minute or two of it being saved.
As your access changes in the future, you should edit the Resource
sections,
adding and removing buckets as needed. Remember to modify both Resource
sections that have bucket names!
Now that your local access has been set, the owners of the other AWS accounts need to grant you access to their buckets.
Assigning Cross-Account Permissions
If you own an Amazon S3 bucket, and you would like to give access to a Stanford Globus user, you have two options:
-
If you have your own Globus endpoint, you can make a Guest Collection, and share it with the Stanford Globus user.
-
You can give the Stanford Globus user access to your Amazon S3 bucket through Amazon S3, by giving access to their IAM User.
This section explains how you (an Amazon S3 bucket owner) can give access to someone else’s IAM User, such that they can access the Amazon S3 bucket with Globus.
In order to grant access, you will need their IAM User’s ARN. The person’s IAM
User ARN should have the format
arn:aws:iam::AWS_ACCOUNT_NUMBER:user/USER_NAME
.
Within the AWS Console, navigate to the S3 section and click on your bucket’s name.
Click on the Permissions tab.
Scroll down to the Bucket policy section and cick on Edit.
The policy editor starts with a ‘null’ policy, which shows you the different parts but does not actually do anything.
Replace that null policy with the following JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Bucket",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::AWS_ACCOUNT_NUMBER:user/USER_NAME"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::BUCKET_NAME"
},
{
"Sid": "Objects",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::AWS_ACCOUNT_NUMBER:user/USER_NAME"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::BUCKET_NAME/*"
}
]
}
Replace BUCKET_NAME
with your bucket’s name, and fill in the IAM User
information you were given.
The above policy provides full access to download from, upload to, and delete objects in the bucket. You can limit access by tailoring the policy:
-
To remove access to upload files, remove the
s3:PutObject
,s3:ListMultipartUploadParts
, ands3:AbortMultipartUpload
permission. -
To remove access to delete objects, remove the
s3:DeleteObject
permission. -
To limit which part of the bucket the user may access, tailor the
Resource
section in the second policy statement.
If you make changes, make sure to remove any excess commas (the last item in a JSON list must have no comma). Make sure there are no JSON Security notices, Errors, or Warnings. Finally, click on Save changes.
Within a minute or two of you saving changes, they will take effect. As long as the Stanford Globus user has finished the IAM configuration on their end, they will now be able to access your bucket.
Loading Credentials into Globus
With an IAM User created, you can now upload your credentials to Globus.
Using the link at the top of the page, access the S3 collection. You might be asked to log in; if so, log in through Stanford University.
You will be taken to the main page for the collection. Click on the ‘Credentials’ tab.
If this is the first time you accessed this collection, you will be asked to give consent for Globus to store your S3 credentials. Click ‘Continue’.
Some institutions allow you to have multiple accounts. Stanford only allows one SUNetID per person, so click on your SUNetID.
Finally, click on ‘Allow’ to give Globus permission to store your S3 credentials.
Once consent is granted, you will be asked to enter your IAM User credentials. Enter the Access Key ID and Secret Key from when you created your IAM User.
If you go to the Credentials tab after entering an IAM User credential, you will see the Access Key ID, along with an option to replace the credential (the gear icon) or delete the credential (the trashcan icon).
You should now proceed to accessing the collection!
Accessing Files on S3
With AWS IAM User credentials loaded and permissions granted, you may now proceed to access your data on Amazon S3 through Globus!
Using the link at the top of the page, access the S3 collection. You might be asked to log in; if so, log in through Stanford University.
Click on the “Open in File Manager” button. That will take you to the File Manager and connect to S3.
First-Time Access
The first time you access the S3 collection, you will be asked for consent.
When you first loaded your credentials, you gave Globus consent to store those credentials for you. Now, you are giving Globus consent to actually use those credentials to talk to S3. Click the “Continue” button.
Some institutions allow you to have multiple accounts. Stanford only allows one SUNetID per person, so click on your SUNetID.
Finally, click on ‘Allow’ to give Globus permission to use your AWS IAM User credentials to access Amazon S3.
Subsequent Accesses
When you access the S3 collection—assuming you have previously provided consent—you should be greeted with a list of the buckets from your AWS Account.
To access one of the buckets—assuming your IAM User has permissions—double-click on the bucket’s name.
If your IAM User does not have permissions, attempting to list the contents of the bucket will give an error.
Once you have access to a bucket, you can transfer files in and out like any other Globus collection.
Cross-Account Bucket Access
Before accessing the bucket, your IAM User must have a policy attached giving it permissions to access the bucket. Also, the owner of the bucket must attach a policy giving your IAM User access.
If your IAM User has been given access to a bucket in a different AWS account, it will not appear in the list of buckets that you see when you first access the collection.
To actually access the bucket, manually enter the bucket name into the Path field.
When you press the Enter key, Globus will attempt to access the bucket. If your permissions are correct, you will see a file listing.