Atlas Data Federation supports Google Cloud Storage buckets as federated database instance stores. You must define mappings in your federated database instance to your Cloud Storage bucket to run queries against your data.
Note
We refer to objects as files and delimiter-separated prefixes as directories in this page. However, these object storage services aren't actually file systems and don't have the same behaviors in all cases as files on a hard drive.
Configuration File Format
To define a federated database instance store with Google Cloud, you can specify the configuration parameters in JSON format. The configuration contains the Google Cloud data store and maps it to virtual collections that you can query.
The JSON configuration for data in Google Cloud uses the following fields:
1 { 2 "stores" : [ 3 { 4 "name" : "<string>", 5 "provider" : "<string>", 6 "region" : "<string>", 7 "bucket" : "<string>", 8 "prefix": "<string>", 9 "delimiter": "<string>" 10 } 11 ], 12 "databases" : [ 13 { 14 "name" : "<string>", 15 "collections" : [ 16 { 17 "name" : "<string>", 18 "dataSources" : [ 19 { 20 "storeName" : "<string>", 21 "path" : "<string>", 22 "defaultFormat" : "<string>", 23 "provenanceFieldName": "<string>", 24 "omitAttributes": <boolean> 25 } 26 ] 27 } 28 ], 29 "maxWildcardCollections" : <integer>, 30 "views" : [ 31 { 32 "name" : "<string>", 33 "source" : "<string>", 34 "pipeline" : "<string>" 35 } 36 ] 37 } 38 ] 39 } 40
The JSON configuration for Google Cloud contains
two top-level objects: stores and
databases
stores
The stores object defines each data store associated with the
federated database instance. The federated database instance store captures files in Google Cloud.
Data Federation can only access data stores defined in the stores object.
The stores object contains the following fields:
1 "stores" : [ 2 { 3 "name" : "<string>", 4 "provider" : "<string>", 5 "region" : "<string>", 6 "bucket" : "<string>", 7 "prefix": "<string>", 8 "delimiter": "<string>" 9 } 10 ]
The following table describes the fields in the stores object:
Field | Type | Necessity | Description | ||||
|---|---|---|---|---|---|---|---|
array | Required | Array of objects where each object represents a data store to associate with the federated database instance. The federated database instance store captures:
Atlas Data Federation can only access data stores
defined in the | |||||
string | Required | Name of the federated database instance store. The
| |||||
string | Required | Name of the cloud provider where the data is stored. The value must
be | |||||
string | Required | Name of the Google Cloud region in which the Google Cloud Storage bucket is hosted. For a list of valid region names, see Google Cloud Platform (GCP). | |||||
string | Required | Name of the Google Cloud Storage bucket. Must exactly match the name of a Google Cloud Storage bucket that Atlas Data Federation must access. | |||||
string | Optional | Prefix Atlas Data Federation applies when searching for files in the Google Cloud Storage
bucket. For example, consider a Google Cloud Storage bucket The federated database instance store prepends the value of Defaults to the root of the Google Cloud Storage bucket, retrieving all files. | |||||
string | Optional | Delimiter that separates
|
databases
The databases object defines the mapping between each
federated database instance store defined in stores and MongoDB collections
in the databases.
The databases object contains the following fields:
1 "databases" : [ 2 { 3 "name" : "<string>", 4 "collections" : [ 5 { 6 "name" : "<string>", 7 "dataSources" : [ 8 { 9 "storeName" : "<string>", 10 "path" : "<string>", 11 "defaultFormat" : "<string>", 12 "provenanceFieldName": "<string>", 13 "omitAttributes": <boolean> 14 } 15 ] 16 } 17 ], 18 "maxWildcardCollections" : <integer>, 19 "views" : [ 20 { 21 "name" : "<string>", 22 "source" : "<string>", 23 "pipeline" : "<string>" 24 } 25 ] 26 } 27 ]
The following table describes the fields in the databases object:
Field | Type | Necessity | Description | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
array | Required | Array of objects where each object represents a database, its
collections, and, optionally, any views
on the collections. Each database can have multiple
| |||||||||||||
string | Required | Name of the database to which Atlas Data Federation maps the data contained in the data store. | |||||||||||||
array | Required | Array of objects where each object represents a collection
and data sources that map to a | |||||||||||||
string | Required | Name of the collection to which Atlas Data Federation maps
the data contained in each
You can generate collection names dynamically from file paths
by specifying | |||||||||||||
array | Required | Array of objects where each object represents a
| |||||||||||||
string | Required | Name of a federated database instance store to map to the | |||||||||||||
string | Required | Controls how Atlas Data Federation searches for and parses files in
the For example, consider a Google Cloud Storage bucket named A A If the Appending the
See Define Path for S3 Data for more information. When specifying the
When specifying attributes of the same type, do any of the following:
| |||||||||||||
string | Optional | Default format that Data Federation assumes if it encounters
a file without an extension while searching the
The following values are valid for the
If your file format is If omitted, Data Federation attempts to detect the file type by processing a few bytes of the file. See also: Supported Data Formats | |||||||||||||
string | Optional | Name for the field that includes the provenance of the documents in the results. If you specify this setting in the storage configuration, Atlas Data Federation returns the following fields for each document in the result:
You can't configure this setting using the Visual Editor in the Atlas UI. | |||||||||||||
boolean | Optional | Flag that specifies whether to omit the attributes (key and value pairs) that Atlas Data Federation adds to documents in the collection. You can specify one of the following values:
If omitted, defaults to For example, consider a file named
|
Example Configuration for Google Cloud Storage Bucket
Consider a Google Cloud Storage bucket datacenter-alpha containing data
collected from a datacenter:
|--metrics |--hardware
The /metrics/hardware path stores JSON files with metrics
derived from the datacenter hardware, where each filename is
the UNIX timestamp in milliseconds of the 24 hour period
covered by that file:
/hardware/1564671291998.json
The following configuration:
Defines a federated database instance store on the
datacenter-alphaGoogle Cloud Storage bucket in theus-central1Google Cloud region. The federated database instance store is specifically restricted to include only data files in themetricsdirectory path. A delimiter of/is defined to simulate a file system hierarchy for ease of navigation and retrieval.Maps files from the
hardwaredirectory to a MongoDB databasedatacenter-alpha-metricsand collectionhardware. The configuration mapping includes parsing logic for capturing the timestamp implied in the filename.
{ "stores" : [ { "name" : "datacenter-alpha", "provider" : "gcs", "region" : "us-central1", "bucket" : "datacenter-alpha", "prefix": "metrics", "delimiter": "/" } ], "databases" : [ { "name" : "datacenter-alpha-metrics", "collections" : [ { "name" : "hardware", "dataSources" : [ { "storeName" : "datacenter-alpha", "path" : "/hardware/{date date}" } ] } ] } ] }
Atlas Data Federation parses the Google Cloud Storage bucket datacenter-alpha and processes
all files under /metrics/hardware/. The collections object
uses the path parsing syntax to map the
filename to the date field, which is an ISO-8601 date, in each
document. If a matching date field does not exist in a document,
Atlas Data Federation adds it.
Users connected to the federated database instance can use the MongoDB Query Language and
supported aggregations to analyze data in the Google Cloud Storage bucket through
the datacenter-alpha-metrics.hardware collection.