ATLAS-4633: Multiple typos in official Apache Atlas Docs

Signed-off-by: Pinal Shah <pinal.shah@freestoneinfotech.com>
This commit is contained in:
ranger_qe_sr21089 2022-07-13 12:04:48 +05:30 committed by Pinal Shah
parent e5e63c150e
commit bc41d13255
33 changed files with 101 additions and 101 deletions

View File

@ -6,7 +6,7 @@ menu: ASF
import {CustomLink} from "theme/components/shared/common/CustomLink";
# ASF Infomation
# ASF Information
1. <CustomLink href="http://www.apache.org/foundation/how-it-works.html">How Apache Works</CustomLink>

View File

@ -13,7 +13,7 @@ import SyntaxHighlighter from 'react-syntax-highlighter';
# Business Metadata
## Overview
Atlas typesystem allows users to define a model and create entities for the metadata objects they want to manage.
Typically the model captures technical attributes - like name, description, create time, number of replicas, etc; and
Typically, the model captures technical attributes - like name, description, create time, number of replicas, etc.; and
metadata objects are created and updated by processes that monitor the real objects. It is often necessary to
augment technical attributes with additional attributes to capture business details that can help organize, search and
manage metadata entities. For example, a steward from marketing department can define set of attributes for a campaign,

View File

@ -104,9 +104,9 @@ pgp downloaded_file.asc`}
* Entity Purge: added REST APIs to purge deleted entities
* Search: ability to find entities by more than one classification
* Performance: improvements in lineage retrieval and classification-propagation
* Notification: ability to process notificaitons from multiple Kafka topics
* Notification: ability to process notifications from multiple Kafka topics
* Hive Hook: tracks process-executions via hive_process_execution entities
* Hive Hook: catures DDL operations via hive_db_ddl and hive_table_ddl entities
* Hive Hook: captures DDL operations via hive_db_ddl and hive_table_ddl entities
* Notification: introduced shell entities to record references to non-existing entities in notifications
* Spark: added model to capture Spark entities, processes and relationships
* AWS S3: introduced updated model to capture AWS S3 entities and relationships
@ -133,7 +133,7 @@ pgp downloaded_file.asc`}
* Notification processing to support batch-commits
* New option in notification processing to ignore potentially incorrect hive_column_lineage
* Updated Hive hook to avoid duplicate column-lineage entities; also updated Atlas server to skip duplicate column-lineage entities
* Improved batch processing in notificaiton handler to avoid processing of an entity multiple times
* Improved batch processing in notification handler to avoid processing of an entity multiple times
* Add option to ignore/prune metadata for temporary/staging hive tables
* Avoid unnecessary lookup when creating new relationships
* UI Improvements:
@ -159,7 +159,7 @@ pgp downloaded_file.asc`}
* Support for JanusGraph graph database
* New DSL implementation, using ANTLR instead of Scala
* Removal of older type system implementation in atlas-typesystem library
* Metadata security - fine grained authorization
* Metadata security - fine-grained authorization
* Notification enhancements to support V2 style data structures
* Jackson library update from 1.9.13 to 2.9.2
* Classification propagation via entity relationships

View File

@ -12,7 +12,7 @@ import Img from 'theme/components/shared/Img'
# Glossary
A Glossary provides appropriate vocabularies for business users and it allows the terms (words) to be related to each
A Glossary provides appropriate vocabularies for business users, and it allows the terms (words) to be related to each
other and categorized so that they can be understood in different contexts. These terms can be then mapped to assets
like a Database, tables, columns etc. This helps abstract the technical jargon associated with the repositories and
allows the user to discover/work with data in the vocabulary that is more familiar to them.
@ -29,13 +29,13 @@ allows the user to discover/work with data in the vocabulary that is more famili
### What is a Glossary term ?
A term is a useful word for an enterprise. For the term(s) to be useful and meaningful, they need to grouped around their
use and context. A term in Apache Atlas must have a unique qualifiedName, there can be term(s) with same name but they
use and context. A term in Apache Atlas must have a unique qualifiedName, there can be term(s) with same name, but they
cannot belong to the same glossary. Term(s) with same name can exist only across different glossaries. A term name can
contain spaces, underscores and dashes (as natural ways of referring to words) but no "." or "@", as the qualifiedName
takes the following form `term name`@`glossary qualified name`. The fully qualified name makes it easier to work with
a specific term.
A term can only belong to single glossary and it's lifecycle is bound to the same i.e. if the Glossary is deleted then
A term can only belong to single glossary, and it's lifecycle is bound to the same i.e. if the Glossary is deleted then
the term gets deleted as well. A term can belong to zero or more categories, which allows scoping them into narrower or
wider contexts. A term can be assigned/linked to zero or more entities in Apache Atlas. A term can be classified using
classifications (tags) and the same classification gets applied to the entities that the term is assigned to.
@ -43,7 +43,7 @@ classifications (tags) and the same classification gets applied to the entities
### What is a Glossary category ?
A category is a way of organizing the term(s) so that the term's context can be enriched. A category may or may not have
contained hierarchies i.e. child category hierarchy. A category's qualifiedName is derived using it's hierarchical location
contained hierarchies i.e. child category hierarchy. A category's qualifiedName is derived using its hierarchical location
within the glossary e.g. `Category name`.`parent category qualifiedName`. This qualified name gets updated when any
hierarchical change happens, e.g. addition of a parent category, removal of parent category or change of parent category.
@ -52,19 +52,19 @@ hierarchical change happens, e.g. addition of a parent category, removal of pare
Apache Atlas UI has been updated to provide user-friendly interface to work with various aspects of glossary, including:
* create glossaries, terms and categories
* create various relationships between terms - like synonymns, antonymns, seeAlso
* create various relationships between terms - like synonyms, antonyms, seeAlso
* organize categories in hierarchies
* assign terms to entities
* search for entities using associated terms
Most of glossary related UI can be found under a new tab named GLOSSARY, which is present right next to existing
Most glossary related UI can be found under a new tab named GLOSSARY, which is present right next to existing
familiar tabs SEARCH and CLASSIFICATION.
#### **Glossary tab**
Apache Atlas UI provides two ways to work with a glossary - term view and category view.
Term view allows an user to perform the following operations:
Term view allows a user to perform the following operations:
* create, update and delete terms
* add, remove and update classifications associated with a term
@ -72,7 +72,7 @@ Term view allows an user to perform the following operations:
* create various relationships between terms
* view entities associated with a term
Category view allows an user to perform the following operations:
Category view allows a user to perform the following operations:
* create, update and delete categories and sub-categories
* associate terms to categories

View File

@ -37,7 +37,7 @@ becomes unavailable either because it is deliberately stopped, or due to unexpec
instances will automatically be elected as an 'active' instance and start to service user requests.
An 'active' instance is the only instance that can respond to user requests correctly. It can create, delete, modify
or respond to queries on metadata objects. A 'passive' instance will accept user requests, but will redirect them
or respond to the queries on metadata objects. A 'passive' instance will accept user requests, but will redirect them
using HTTP redirect to the currently known 'active' instance. Specifically, a passive instance will not itself
respond to any queries on metadata objects. However, all instances (both active and passive), will respond to admin
requests that return information about that instance.

View File

@ -54,7 +54,7 @@ Follow the instructions below to setup Atlas hook in HBase:
The following properties in atlas-application.properties control the thread pool and notification details:
<SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}>
{`atlas.hook.hbase.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in HBase operations. Default: false
{`atlas.hook.hbase.synchronous=false # whether to run the hook synchronously. false is recommended to avoid delays in HBase operations. Default: false
atlas.hook.hbase.numRetries=3 # number of retries for notification failure. Default: 3
atlas.hook.hbase.queueSize=10000 # queue size for the threadpool. Default: 10000
atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary
@ -68,12 +68,12 @@ Other configurations for Kafka notification producer can be specified by prefixi
For list of configuration supported by Kafka producer, please refer to [Kafka Producer Configs](http://kafka.apache.org/documentation/#producerconfigs)
## NOTES
* Only the namespace, table and column-family create/update/ delete operations are captured by Atlas HBase hook. Changes to columns are be captured.
* Only the namespace, table and column-family create/update/delete operations are captured by Atlas HBase hook. Changes to columns are be captured.
## Importing HBase Metadata
Apache Atlas provides a command-line utility, import-hbase.sh, to import metadata of Apache HBase namespaces and tables into Apache Atlas.
This utility can be used to initialize Apache Atlas with namespaces/tables present in a Apache HBase cluster.
This utility can be used to initialize Apache Atlas with namespaces/tables present in an Apache HBase cluster.
This utility supports importing metadata of a specific table, tables in a specific namespace or all tables.
<SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}>

View File

@ -80,7 +80,7 @@ Follow the instructions below to setup Atlas hook in Hive:
The following properties in atlas-application.properties control the thread pool and notification details:
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
{`atlas.hook.hive.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in Hive query completion. Default: false
{`atlas.hook.hive.synchronous=false # whether to run the hook synchronously. false is recommended to avoid delays in Hive query completion. Default: false
atlas.hook.hive.numRetries=3 # number of retries for notification failure. Default: 3
atlas.hook.hive.queueSize=10000 # queue size for the threadpool. Default: 10000
atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary
@ -128,7 +128,7 @@ The lineage is captured as
## NOTES
* Column level lineage works with Hive version 1.2.1 after the patch for <a href="https://issues.apache.org/jira/browse/HIVE-13112">HIVE-13112</a> is applied to Hive source
* Since database name, table name and column names are case insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names
* Since database name, table name and column names are case-insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names
* The following hive operations are captured by hive hook currently
* create database
* create table/view, create table as select

View File

@ -39,7 +39,7 @@ This is used to add entities in Atlas using the model detailed above.
Follow the instructions below to setup Atlas hook in Hive:
Add the following properties to to enable Atlas hook in Sqoop:
Add the following properties to enable Atlas hook in Sqoop:
* Set-up Atlas hook in `<sqoop-conf>`/sqoop-site.xml by adding the following:
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
@ -53,7 +53,7 @@ Add the following properties to to enable Atlas hook in Sqoop:
* untar apache-atlas-${project.version}-sqoop-hook.tar.gz
* cd apache-atlas-sqoop-hook-${project.version}
* Copy entire contents of folder apache-atlas-sqoop-hook-${project.version}/hook/sqoop to `<atlas package>`/hook/sqoop
* Copy `<atlas-conf>`/atlas-application.properties to to the sqoop conf directory `<sqoop-conf>`/
* Copy `<atlas-conf>`/atlas-application.properties to the sqoop conf directory `<sqoop-conf>`/
* Link `<atlas package>`/hook/sqoop/*.jar in sqoop lib
@ -61,7 +61,7 @@ Add the following properties to to enable Atlas hook in Sqoop:
The following properties in atlas-application.properties control the thread pool and notification details:
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
{`atlas.hook.sqoop.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in Sqoop operation completion. Default: false
{`atlas.hook.sqoop.synchronous=false # whether to run the hook synchronously. false is recommended to avoid delays in Sqoop operation completion. Default: false
atlas.hook.sqoop.numRetries=3 # number of retries for notification failure. Default: 3
atlas.hook.sqoop.queueSize=10000 # queue size for the threadpool. Default: 10000
atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary

View File

@ -117,7 +117,7 @@ STORM_JAR_JVM_OPTS:"-Datlas.conf=$ATLAS_HOME/conf/"
where ATLAS_HOME is pointing to where ATLAS is installed.
You could also set this up programatically in Storm Config as:
You could also set this up programmatically in Storm Config as:
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
{`Config stormConf = new Config();

View File

@ -54,7 +54,7 @@ The current implementation has 2 options. Both are optional:
* _fetchType_ This option configures the approach used for fetching entities. It has the following values:
* _FULL_: This fetches all the entities that are connected directly and indirectly to the starting entity. E.g. If a starting entity specified is a table, then this option will fetch the table, database and all the other tables within the database.
* _CONNECTED_: This fetches all the etnties that are connected directly to the starting entity. E.g. If a starting entity specified is a table, then this option will fetch the table and the database entity only.
* _CONNECTED_: This fetches all the entities that are connected directly to the starting entity. E.g. If a starting entity specified is a table, then this option will fetch the table and the database entity only.
* _INCREMENTAL_: See [here](#/IncrementalExport) for details.
@ -104,7 +104,7 @@ The _AtlasExportRequest_ below specifies the _fetchType_ as _FULL_. The _matchTy
}`}
</SyntaxHighlighter>
The _AtlasExportRequest_ below specifies the _guid_ instead of _uniqueAttribues_ to fetch _accounts@cl1_.
The _AtlasExportRequest_ below specifies the _guid_ instead of _uniqueAttributes_ to fetch _accounts@cl1_.
<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`{
@ -117,7 +117,7 @@ The _AtlasExportRequest_ below specifies the _guid_ instead of _uniqueAttribues_
}`}
</SyntaxHighlighter>
The _AtlasExportRequest_ below specifies the _fetchType_ as _connected_. The _matchType_ option will fetch _accountsReceivable_, _accountsPayable_, etc present in the database.
The _AtlasExportRequest_ below specifies the _fetchType_ as _connected_. The _matchType_ option will fetch _accountsReceivable_, _accountsPayable_, etc. present in the database.
<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`{

View File

@ -103,7 +103,7 @@ Steps to use the behavior:
The output of Export has _atlas-typedef.json_ that contains the type definitions for the entities exported.
By default (that is if no options are specified), the type definitions are imported and applied to the system being imported to. The entity import is performed after this.
By default, (that is if no options are specified), the type definitions are imported and applied to the system being imported to. The entity import is performed after this.
In some cases, you would not want to modify the type definitions. The import may be better off failing than the types be modified.
@ -152,7 +152,7 @@ _CURL_
#### Handling Large Imports
By default, the Import Service stores all of the data in memory. This may be limiting for ZIPs containing a large amount of data.
By default, the Import Service stores all the data in memory. This may be limiting for ZIPs containing a large amount of data.
To configure the temporary directory use the application property _atlas.import.temp.directory_. If this property is left blank, the default in-memory implementation is used.

View File

@ -28,8 +28,8 @@ The existing transformation frameworks allowed this to happen.
#### Reason for New Transformation Framework
While the existing framework provided the basic benefits of the transformation framework, it did not have support for some of the commonly used Atlas types. Which meant that users of this framework would have to meticulously define transformations for every type they are working with. This can be tedious and potentially error-prone.
The new framework addresses this problem by providing built-in transformations for some of the commonly used types. It can also be extended to accommodate new types.
While the existing framework provided the basic benefits of the transformation framework, it did not have support for some commonly used Atlas types. Which meant that users of this framework would have to meticulously define transformations for every type they are working with. This can be tedious and potentially error-prone.
The new framework addresses this problem by providing built-in transformations for some commonly used types. It can also be extended to accommodate new types.
#### Approach

View File

@ -27,7 +27,7 @@ The Import-Export APIs for Atlas facilitate the transfer of data to and from a c
The APIs when integrated with backup and/or disaster recovery process will ensure participation of Atlas.
### Introduction
There are 2 broad categories viz. Export & Import. The details of the APIs are as discussed below.
There are 2 broad categories' viz. Export & Import. The details of the APIs are as discussed below.
The APIs are available only to _admin_ user.

View File

@ -159,5 +159,5 @@ Apache Atlas 1.0 introduces number of new features. For data that is migrated, t
#### Handling of Entity Definitions that use Classifications as Types
This features is no longer supported. Classifications that are used as types in _attribute definitions_ (_AttributeDefs_) are converted in to new types whose name has _legacy_ prefix. These are then handled like any other type.
This feature is no longer supported. Classifications that are used as types in _attribute definitions_ (_AttributeDefs_) are converted in to new types whose name has _legacy_ prefix. These are then handled like any other type.
Creation of such types was prevented in an earlier release, hence only type definitions have potential to exist. Care has been taken to handle entities of this type as well.

View File

@ -37,7 +37,7 @@ The _additionalInfo_ attribute property is discussed in detail below.
#### Export/Import Audits
The table has following columns:
The table has the following columns:
* _Operation_: EXPORT or IMPORT that denotes the operation performed on instance.
* _Source Server_: For an export operation performed on this instance, the value in this column will always be the cluster name of the current Atlas instance. This is the value specified in _atlas-application.properties_ by the key _atlas.cluster.name_. If not value is specified 'default' is used.
@ -67,7 +67,7 @@ The following export request will end up creating _AtlasServer_ entity with _clM
Often times it is necessary to disambiguate the name of the cluster by specifying the location or the data center within which the Atlas instance resides.
The name of the cluster can be specified by separating the location name and cluster name by '$'. For example, a clsuter name specified as 'SFO$cl1' can be a cluster in San Fancisco (SFO) data center with the name 'cl1'.
The name of the cluster can be specified by separating the location name and cluster name by '$'. For example, a cluster name specified as 'SFO$cl1' can be a cluster in San Francisco (SFO) data center with the name 'cl1'.
The _AtlasServer_ will handle this and set its name as 'cl1' and _fullName_ as 'SFO@cl1'.

View File

@ -13,11 +13,11 @@ import SyntaxHighlighter from 'react-syntax-highlighter';
#### Background
Entity attributes are specified using attribute definitions. An attributes persistence strategy is determined by based on their type.
Entity attributes are specified using attribute definitions. An attributes' persistence strategy is determined by based on their type.
Primitive types are persisted as properties within the vertex of their parent.
Non-primitive attributes get a vertex of their own and and edge is created between the parent the child to establish ownership.
Non-primitive attributes get a vertex of their own and edge is created between the parent the child to establish ownership.
Attribute with _isSoftReference_ option set to _true_, is non-primitive attribute that gets treatment of a primitive attribute.

View File

@ -60,7 +60,7 @@ Notification includes the following data.
</SyntaxHighlighter>
Apache Atlas 1.0 can be configured to send notifications in older version format, instead of the latest version format.
This can be helpful in deployments that are not yet ready to process notifications in latest version format.
This can be helpful in deployments that are not yet ready to process notifications in the latest version format.
To configure Apache Atlas 1.0 to send notifications in earlier version format, please set following configuration in atlas-application.properties:
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>

View File

@ -40,7 +40,7 @@ capabilities around these data assets for data scientists, analysts and the data
* SQL like query language to search entities - Domain Specific Language (DSL)
### Security & Data Masking
* Fine grained security for metadata access, enabling controls on access to entity instances and operations like add/update/remove classifications
* Fine-grained security for metadata access, enabling controls on access to entity instances and operations like add/update/remove classifications
* Integration with Apache Ranger enables authorization/data-masking on data access based on classifications associated with entities in Apache Atlas. For example:
* who can access data classified as PII, SENSITIVE
* customer-service users can only see last 4 digits of columns classified as NATIONAL_ID

View File

@ -8,7 +8,7 @@ submenu: Mailing Lists
# Project Mailing Lists
* These are the mailing lists that have been established for this project. For each list, there is a subscribe, unsubscribe, and an archive link.
* These are the mailing lists that have been established for this project. For each list, there is a - subscribe, unsubscribe, and an archive link.
| **Name** | **Subscribe** | **Unsubscribe** | **Post** | **Archive** |

View File

@ -10,7 +10,7 @@ submenu: Team List
import TeamList from 'theme/components/shared/TeamList'
#### A successful project requires many people to play many roles. Some members write code or documentation, while others are valuable as testers, submitting patches and suggestions.
#### The team is comprised of Members and Contributors. Members have direct access to the source of a project and actively evolve the code-base. Contributors improve the project through submission of patches and suggestions to the Members. The number of Contributors to the project is unbounded. Get involved today. All contributions to the project are greatly appreciated.
#### The team comprises Members and Contributors. Members have direct access to the source of a project and actively evolve the code-base. Contributors improve the project through submission of patches and suggestions to the Members. The number of Contributors to the project is unbounded. Get involved today. All contributions to the project are greatly appreciated.
## Members

View File

@ -23,9 +23,9 @@ Benefits of DSL:
* Use of classifications is accounted for in the syntax.
* Provides way to group and aggregate results.
We will be using the quick start dataset in the examples that follow. This dataset is comprehensive enough to be used to to demonstrate the various features of the language.
We will be using the quick start dataset in the examples that follow. This dataset is comprehensive enough to be used to demonstrate the various features of the language.
For details on the grammar, please refer to Atlas DSL Grammer on [Github](https://github.com/apache/atlas/blob/master/repository/src/main/java/org/apache/atlas/query/antlr4/AtlasDSLParser.g4) (Antlr G4 format).
For details on the grammar, please refer to Atlas DSL Grammar on [GitHub](https://github.com/apache/atlas/blob/master/repository/src/main/java/org/apache/atlas/query/antlr4/AtlasDSLParser.g4) (Antlr G4 format).
## Using Advanced Search
@ -56,7 +56,7 @@ In the absence of _where_ for filtering on the source, the dataset fetched by th
The _where_ clause allows for filtering over the dataset. This achieved by using conditions within the where clause.
A conditions is identifier followed by an operator followed by a literal. Literal must be enclosed in single or double quotes. Example, _name = "Sales"_. An identifier can be name of the property of the type specified in the _from_ clause or an alias.
A condition is an identifier followed by an operator followed by a literal. Literal must be enclosed in single or double quotes. Example, _name = "Sales"_. An identifier can be the name of the property of the type specified in the _from_ clause or an alias.
Example: To retrieve entity of type _Table_ with a specific name say time_dim:
@ -125,7 +125,7 @@ Dates in this format follow this notation:
* _yyyy-MM-ddTHH:mm:ss.SSSZ_. Which means, year-month-day followed by time in hour-minutes-seconds-milli-seconds. Date and time need to be separated by 'T'. It should end with 'Z'.
* _yyyy-MM-dd_. Which means, year-month-day.
Example: Date represents December 11, 2017 at 2:35 AM.
Example: Date represents December 11, 2017, at 2:35 AM.
<SyntaxHighlighter wrapLines={true} language="sql" style={theme.dark}>
{`2017-12-11T02:35:0.0Z`}
@ -140,7 +140,7 @@ Example: To retrieve entity of type _Table_ created within 2017 and 2018.
#### Using Boolean Literals
Properties of entities of type boolean can be used within queries.
Eample: To retrieve entity of type hdfs_path whose attribute _isFile_ is set to _true_ and whose name is _Invoice_.
Example: To retrieve entity of type hdfs_path whose attribute _isFile_ is set to _true_ and whose name is _Invoice_.
<SyntaxHighlighter wrapLines={true} language="sql" style={theme.dark}>
{`from hdfs_path where isFile = true or name = "Invoice"`}
@ -151,7 +151,7 @@ Valid values for boolean literals are 'true' and 'false'.
### Existence of a Property
The has keyword can be used with or without the where clause. It is used to check existence of a property in an entity.
Example: To retreive entity of type Table with a property locationUri.
Example: To retrieve entity of type Table with a property locationUri.
<SyntaxHighlighter wrapLines={true} language="html" style={theme.dark}>
{`Table has locationUri
@ -240,7 +240,7 @@ Example: To retrieve all the entities that are tagged with _Dimension_ classific
{`Dimension where Dimension.priority = "high"`}
</SyntaxHighlighter>
###Non Primitive attribute Filtering
###Non-Primitive attribute Filtering
In the discussion so far we looked at where clauses with primitive types. This section will look at using properties that are non-primitive types.
#### Relationship-based filtering
@ -432,7 +432,7 @@ Example: To know the number of entities owned by each owner.
</SyntaxHighlighter>
### Using System Attributes
Each type defined within Atlas gets few attributes by default. These attributes help with internal book keeping of the entities. All the system attributes are prefixed with '__' (double underscore). This helps in identifying them from other attributes.
Each type defined within Atlas gets few attributes by default. These attributes help with internal bookkeeping of the entities. All the system attributes are prefixed with '__' (double underscore). This helps in identifying them from other attributes.
Following are the system attributes:
* __guid Each entity within Atlas is assigned a globally unique identifier (GUID for short).
* __modifiedBy Name of the user who last modified the entity.
@ -518,4 +518,4 @@ The following clauses are no longer supported:
## Resources
* Antlr [Book](https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference).
* Antlr [Quick Start](https://github.com/antlr/antlr4/blob/master/doc/getting-started.md).
* Atlas DSL Grammar on [Github](https://github.com/apache/atlas/blob/master/repository/src/main/java/org/apache/atlas/query/antlr4/AtlasDSLParser.g4) (Antlr G4 format).
* Atlas DSL Grammar on [GitHub](https://github.com/apache/atlas/blob/master/repository/src/main/java/org/apache/atlas/query/antlr4/AtlasDSLParser.g4) (Antlr G4 format).

View File

@ -79,7 +79,7 @@ Following authorization policy allows user 'admin' to perform export/import admi
### Apache Ranger access audit for Apache Atlas authorizations
Apache Ranger authorization plugin generates audit logs with details of the access authorized by the plugin. The details
include the object accessed (eg. hive_table with ID cost_savings.claim_savings@cl1), type of access performed (eg.
include the object accessed (e.g. hive_table with ID cost_savings.claim_savings@cl1), type of access performed (e.g.
entity-add-classification, entity-remove-classification), name of the user, time of access and the IP address the access
request came from - as shown in the following image.

View File

@ -129,7 +129,7 @@ Roles defined above can be assigned (granted) to users as shown below:
</SyntaxHighlighter>
Roles can be assigned (granted) to user-groups as shown below. An user can belong to multiple groups; roles assigned to
Roles can be assigned (granted) to user-groups as shown below. A user can belong to multiple groups; roles assigned to
all groups the user belongs to will be used to authorize the access.
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>

View File

@ -112,7 +112,7 @@ atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
atlas.authentication.method.ldap.ad.default.role=ROLE_USER`}
</SyntaxHighlighter>
### LDAP Directroy
### LDAP Directory
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
{`atlas.authentication.method.ldap.url=ldap://<Ldap server ip>:389
@ -130,8 +130,8 @@ atlas.authentication.method.ldap.default.role=ROLE_USER`}
### Keycloak Method.
To enable Keycloak authentication mode in Atlas, set the property `atlas.authentication.method.keycloak` to true and also set the property `atlas.authentication.method.keycloak.file` to the localtion of your `keycloak.json` in `atlas-application.properties`.
Also set `atlas.authentication.method.keycloak.ugi-groups` to false if you want to pickup groups from Keycloak. By default the groups will be picked up from the *roles* defined in Keycloak. In case you want to use the groups
To enable Keycloak authentication mode in Atlas, set the property `atlas.authentication.method.keycloak` to true and also set the property `atlas.authentication.method.keycloak.file` to the location of your `keycloak.json` in `atlas-application.properties`.
Also set `atlas.authentication.method.keycloak.ugi-groups` to false if you want to pickup groups from Keycloak. By default, the groups will be picked up from the *roles* defined in Keycloak. In case you want to use the groups
you need to create a mapping in keycloak and define `atlas.authentication.method.keycloak.groups_claim` equal to the token claim name. Make sure **not** to use the full group path and add the information to the access token.
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
@ -140,7 +140,7 @@ atlas.authentication.method.keycloak.file=/opt/atlas/conf/keycloak.json
atlas.authentication.method.keycloak.ugi-groups=false`}
</SyntaxHighlighter>
Setup you keycloak.json per instructions from Keycloak. Make sure to include `"principal-attribute": "preferred_username"` to ensure readable user names and `"autodetect-bearer-only": true`.
Setup you keycloak.json per instructions from Keycloak. Make sure to include `"principal-attribute": "preferred_username"` to ensure readable usernames and `"autodetect-bearer-only": true`.
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
{`{

View File

@ -27,8 +27,8 @@ Both SSL one-way (server authentication) and two-way (server and client authenti
* `keystore.file` - the path to the keystore file leveraged by the server. This file contains the server certificate.
* `truststore.file` - the path to the truststore file. This file contains the certificates of other trusted entities (e.g. the certificates for client processes if two-way SSL is enabled). In most instances this can be set to the same value as the keystore.file property (especially if one-way SSL is enabled).
* `client.auth.enabled` (false|true) [default: false] - enable/disable client authentication. If enabled, the client will have to authenticate to the server during the transport session key creation process (i.e. two-way SSL is in effect).
* `cert.stores.credential.provider.path` - the path to the Credential Provider store file. The passwords for the keystore, truststore, and server certificate are maintained in this secure file. Utilize the cputil script in the 'bin' directoy (see below) to populate this file with the passwords required.
* `atlas.ssl.exclude.cipher.suites` - the excluded Cipher Suites list - *NULL.*,.*RC4.*,.*MD5.*,.*DES.*,.*DSS.* are weak and unsafe Cipher Suites that are excluded by default. If additional Ciphers need to be excluded, set this property with the default Cipher Suites such as atlas.ssl.exclude.cipher.suites=.*NULL.*, .*RC4.*, .*MD5.*, .*DES.*, .*DSS.*, and add the additional Ciper Suites to the list with a comma separator. They can be added with their full name or a regular expression. The Cipher Suites listed in the atlas.ssl.exclude.cipher.suites property will have precedence over the default Cipher Suites. One would keep the default Cipher Suites, and add additional ones to be safe.
* `cert.stores.credential.provider.path` - the path to the Credential Provider store file. The passwords for the keystore, truststore, and server certificate are maintained in this secure file. Utilize the cputil script in the 'bin' directory (see below) to populate this file with the passwords required.
* `atlas.ssl.exclude.cipher.suites` - the excluded Cipher Suites list - *NULL.*,.*RC4.*,.*MD5.*,.*DES.*,.*DSS.* are weak and unsafe Cipher Suites that are excluded by default. If additional Ciphers need to be excluded, set this property with the default Cipher Suites such as atlas.ssl.exclude.cipher.suites=.*NULL.*, .*RC4.*, .*MD5.*, .*DES.*, .*DSS.*, and add the additional Cipher Suites to the list with a comma separator. They can be added with their full name or a regular expression. The Cipher Suites listed in the atlas.ssl.exclude.cipher.suites property will have precedence over the default Cipher Suites. One would keep the default Cipher Suites, and add additional ones to be safe.
#### Credential Provider Utility Script
@ -58,7 +58,7 @@ The properties for configuring service authentication are:
### JAAS configuration
In a secure cluster, some of the components (such as Kafka) that Atlas interacts with, require Atlas to authenticate itself to them using JAAS. The following properties are used to set up appropriate JAAS Configuration.
In a secure cluster, some components (such as Kafka) that Atlas interacts with, require Atlas to authenticate itself to them using JAAS. The following properties are used to set up appropriate JAAS Configuration.
* `atlas.jaas.client-id.loginModuleName` - the authentication method used by the component (for example, com.sun.security.auth.module.Krb5LoginModule)
* `atlas.jaas.client-id.loginModuleControlFlag` (required|requisite|sufficient|optional) [default: required]
@ -126,9 +126,9 @@ MyClient {
## SPNEGO-based HTTP Authentication
HTTP access to the Atlas platform can be secured by enabling the platform's SPNEGO support. There are currently two supported authentication mechanisms:
HTTP accesses to the Atlas platform can be secured by enabling the platform's SPNEGO support. There are currently two supported authentication mechanisms:
* `simple` - authentication is performed via a provided user name
* `simple` - authentication is performed via a provided username
* `kerberos` - the KDC authenticated identity of the client is leveraged to authenticate to the server
The kerberos support requires the client accessing the server to first authenticate to the KDC (usually this is done via the 'kinit' command). Once authenticated, the user may access the server (the authenticated identity will be related to the server via the SPNEGO negotiation mechanism).

View File

@ -13,7 +13,7 @@ import SyntaxHighlighter from 'react-syntax-highlighter';
### Building Apache Atlas
Download Apache Atlas 1.0.0 release sources, apache-atlas-1.0.0-sources.tar.gz, from the [downloads](#/Downloads) page.
Then follow the instructions below to to build Apache Atlas.
Then follow the instructions below to build Apache Atlas.
@ -34,7 +34,7 @@ mvn clean -DskipTests package -Pdist
* NOTES:
* Remove option '-DskipTests' to run unit and integration tests
* To build a distribution without minified js,css file, build with _skipMinify_ profile. By default js and css files are minified.
* To build a distribution without minified js,css file, build with _skipMinify_ profile. By default, js and css files are minified.
Above will build Apache Atlas for an environment having functional HBase and Solr instances. Apache Atlas needs to be setup with the following to run in this environment:

View File

@ -55,7 +55,7 @@ Elasticsearch is a prerequisite for Apache Atlas use. Set the following properti
<SyntaxHighlighter wrapLines={true} language="bash" style={theme.dark}>
{`atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.hostname=<hostname(s) of the Elasticsearch master nodes comma separated>
atlas.graph.index.search.hostname=<hostname(s) of the Elasticsearch master nodes, comma separated>
atlas.graph.index.search.elasticsearch.client-only=true`}
</SyntaxHighlighter>
@ -131,7 +131,7 @@ atlas.server.ha.zookeeper.connect=zk1.company.com:2181,zk2.company.com:2181,zk3.
atlas.server.ha.zookeeper.num.retries=3
# Specify how much time should the server wait before attempting connections to Zookeeper, in case of any connection issues.
atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
# Specify how long a session to Zookeeper should last without inactiviy to be deemed as unreachable.
# Specify how long a session to Zookeeper should last without inactivity to be deemed as unreachable.
atlas.server.ha.zookeeper.session.timeout.ms=20000
# Specify the scheme and the identity to be used for setting up ACLs on nodes created in Zookeeper for HA.
# The format of these options is <scheme:identity>.

View File

@ -43,8 +43,8 @@ Atlas command line tools are written in Python.
* Install the Scala IDE, TestNG, and m2eclipse-scala features/plugins as described below.
**Scala IDE Eclipse feature**
Some of the Atlas source code is written in the Scala programming language. The Scala IDE feature is required to compile Scala source code in Eclipse.
* In Eclipse, choose Help - Install New Software..
Some Atlas source code is written in the Scala programming language. The Scala IDE feature is required to compile Scala source code in Eclipse.
* In Eclipse, choose Help - Install New Software...
* Click Add... to add an update site, and set Location to http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site
* Select Scala IDE for Eclipse from the list of available features
* Restart Eclipse after install
@ -52,13 +52,13 @@ Some of the Atlas source code is written in the Scala programming language. The
*TestNG Eclipse plug-in*
Atlas tests use the [TestNG framework](http://testng.org/doc/documentation-main.html), which is similar to JUnit. The TestNG plug-in is required to run TestNG tests from Eclipse.
* In Eclipse, choose Help - Install New Software..
* In Eclipse, choose Help - Install New Software...
* Click Add... to add an update site, and set Location to http://beust.com/eclipse-old/eclipse_6.9.9.201510270734
* Choose TestNG and continue with install
* Restart Eclipse after installing the plugin
* In Window - Preferences - TestNG, <b>un</b>check "Use project TestNG jar"
*m2eclipse-scala Eclipse plugin*
* In Eclipse, choose Help - Install New Software..
* In Eclipse, choose Help - Install New Software...
* Click Add... to add an update site, and set Location to http://alchim31.free.fr/m2e-scala/update-site/
* Choose Maven Integration for Scala IDE, and continue with install
* Restart Eclipse after install
@ -95,7 +95,7 @@ g. Restart Eclipse
h. Choose Project - Clean, select Clean all projects, and click OK.
Some projects may not pick up the Scala library if this occurs, quick fix on those projects to add in the Scala library projects atlas-typesystem, atlas-repository, hdfs-model, storm-bridge and altas-webapp.
Some projects may not pick up the Scala library if this occurs, quick fix on those projects to add in the Scala library projects atlas-typesystem, atlas-repository, hdfs-model, storm-bridge and atlas-webapp.
You should now have a clean workspace.

View File

@ -68,7 +68,7 @@ To stop Apache Atlas, run following command:
### Configuring Apache Atlas
By default config directory used by Apache Atlas is _{package dir}/conf_. To override this set environment variable ATLAS_CONF to the path of the conf dir.
By default, config directory used by Apache Atlas is _{package dir}/conf_. To override this set environment variable ATLAS_CONF to the path of the conf dir.
Environment variables needed to run Apache Atlas can be set in _atlas-env.sh_ file in the conf directory. This file will be sourced by Apache Atlas scripts before any commands are executed. The following environment variables are available to set.
@ -82,25 +82,25 @@ Environment variables needed to run Apache Atlas can be set in _atlas-env.sh_ fi
# any additional java opts that you want to set for client only
#export ATLAS_CLIENT_OPTS=
# java heap size we want to set for the client. Default is 1024MB
# java heap size we want to set for the client. Default is 1024 MB
#export ATLAS_CLIENT_HEAP=
# any additional opts you want to set for atlas service.
#export ATLAS_SERVER_OPTS=
# java heap size we want to set for the atlas server. Default is 1024MB
# java heap size we want to set for the atlas server. Default is 1024 MB
#export ATLAS_SERVER_HEAP=
# What is is considered as atlas home dir. Default is the base location of the installed software
# What is considered as atlas home dir. Default is the base location of the installed software
#export ATLAS_HOME_DIR=
# Where log files are stored. Defatult is logs directory under the base install location
# Where log files are stored. Default is logs directory under the base install location
#export ATLAS_LOG_DIR=
# Where pid files are stored. Defatult is logs directory under the base install location
# Where pid files are stored. Default is logs directory under the base install location
#export ATLAS_PID_DIR=
# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
# Where do you want to expand the war file. By Default, it is in /server/webapp dir under the base install dir.
#export ATLAS_EXPANDED_WEBAPP_DIR=`}
</SyntaxHighlighter>
@ -122,8 +122,8 @@ The following values are recommended for JDK 8:
export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"
</SyntaxHighlighter>
*NOTE for Mac OS users*
If you are using a Mac OS, you will need to configure the ATLAS_SERVER_OPTS (explained above).
*NOTE for macOS users*
If you are using a macOS, you will need to configure the ATLAS_SERVER_OPTS (explained above).
In _{package dir}/conf/atlas-env.sh_ uncomment the following line
<SyntaxHighlighter wrapLines={true} language="powershell" style={theme.dark}>
@ -169,7 +169,7 @@ SolrCloud mode uses a ZooKeeper Service as a highly available, central location
</SyntaxHighlighter>
* Run the following commands from SOLR_BIN (e.g. $SOLR_HOME/bin) directory to create collections in Apache Solr corresponding to the indexes that Apache Atlas uses. In the case that the Apache Atlas and Apache Solr instances are on 2 different hosts, first copy the required configuration files from ATLAS_HOME/conf/solr on the Apache Atlas instance host to Apache Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where Apache Solr configuration files have been copied to on Apache Solr host:
* Run the following commands from SOLR_BIN (e.g. $SOLR_HOME/bin) directory to create collections in Apache Solr corresponding to the indexes that Apache Atlas uses. In the case that the Apache Atlas and Apache Solr instances are on 2 different hosts, first copy the required configuration files from ATLAS_HOME/conf/solr on the Apache Atlas instance host to Apache Solr instance host. SOLR_CONF in the below-mentioned commands refer to the directory where Apache Solr configuration files have been copied to on Apache Solr host:
<SyntaxHighlighter wrapLines={true} language="powershell" style={theme.dark}>
{`$SOLR_BIN/solr create -c vertex_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
@ -178,7 +178,7 @@ $SOLR_BIN/solr create -c fulltext_index -d SOLR_CONF -shards #numShards -replica
</SyntaxHighlighter>
Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
Otherwise, specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.
The number of replicas (replicationFactor) can be set according to the redundancy required.
@ -200,7 +200,7 @@ For more information on JanusGraph solr configuration , please refer http://docs
Pre-requisites for running Apache Solr in cloud mode
* Memory - Apache Solr is both memory and CPU intensive. Make sure the server running Apache Solr has adequate memory, CPU and disk.
Apache Solr works well with 32GB RAM. Plan to provide as much memory as possible to Apache Solr process
Apache Solr works well with 32 GB RAM. Plan to provide as much memory as possible to Apache Solr process
* Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Apache Solr is going to store the index data
* SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Apache Solr nodes running on different servers with replication enabled.
If using SolrCloud, then you also need ZooKeeper installed and configured with 3 or 5 ZooKeeper nodes
@ -208,7 +208,7 @@ Pre-requisites for running Apache Solr in cloud mode
* Start Apache Solr in http mode - alternative setup to Solr in cloud mode.
Solr Standalone is used for a single instance, and it keeps configuration information on the file system. It does not require zookeeper and provides high performance for medium size index.
Can be consider as a good option for fast prototyping as well as valid configuration for development environments. In some cases it demonstrates a better performance than solr cloud mode in production grade setup of Atlas.
Can be considered as a good option for fast prototyping as well as valid configuration for development environments. In some cases it demonstrates a better performance than solr cloud mode in production grade setup of Atlas.
* Change ATLAS configuration to point to Standalone Apache Solr instance setup. Please make sure the following configurations are set to the below values in ATLAS_HOME/conf/atlas-application.properties

View File

@ -34,7 +34,7 @@ atlas-index-repair/repair_index.py
This will result in vertex_index, edge_index and fulltext_index to be re-built completely. It is recommended that existing contents of these indexes be deleted before executing this restore.
###### Caveats
Note that the full index repair is a time consuming process. Depending on the size of data the process may take days to complete. During the restore process the Basic Search functionality will not be available. Be sure to allocate sufficient time for this activity.
Note that the full index repair is a time-consuming process. Depending on the size of data the process may take days to complete. During the restore process the Basic Search functionality will not be available. Be sure to allocate sufficient time for this activity.
##### Selective Restore

View File

@ -20,7 +20,7 @@ Atlas out of the box (like Hive tables, for e.g.) are modelled using types and r
types of metadata in Atlas, one needs to understand the concepts of the type system component.
## Types
A Type in Atlas is a definition of how a particular type of metadata objects are stored and accessed. A type represents one or a collection of attributes that define the properties for the metadata object. Users with a development background will recognize the similarity of a type to a Class definition of object oriented programming languages, or a table schema of relational databases.
A Type in Atlas is a definition of how a particular type of metadata objects are stored and accessed. A type represents one or a collection of attributes that define the properties for the metadata object. Users with a development background will recognize the similarity of a type to a Class definition of object-oriented programming languages, or a table schema of relational databases.
An example of a type that comes natively defined with Atlas is a Hive table. A Hive table is defined with these
attributes:
@ -56,14 +56,14 @@ The following points can be noted from the above example:
* Enum metatypes
* Collection metatypes: array, map
* Composite metatypes: Entity, Struct, Classification, Relationship
* Entity & Classification types can extend from other types, called supertype - by virtue of this, it will get to include the attributes that are defined in the supertype as well. This allows modellers to define common attributes across a set of related types etc. This is again similar to the concept of how Object Oriented languages define super classes for a class. It is also possible for a type in Atlas to extend from multiple super types.
* Entity & Classification types can extend from other types, called supertype - by virtue of this, it will get to include the attributes that are defined in the supertype as well. This allows modellers to define common attributes across a set of related types etc. This is again similar to the concept of how Object-Oriented languages define super classes for a class. It is also possible for a type in Atlas to extend from multiple super types.
* In this example, every hive table extends from a pre-defined supertype called a DataSet. More details about this pre-defined types will be provided later.
* Types which have a metatype of Entity, Struct, Classification or 'Relationship' can have a collection of attributes. Each attribute has a name (e.g. name) and some other associated properties. A property can be referred to using an expression type_name.attribute_name. It is also good to note that attributes themselves are defined using Atlas metatypes.
* In this example, hive_table.name is a String, hive_table.aliases is an array of Strings, hive_table.db refers to an instance of a type called hive_db and so on.
* Type references in attributes, (like hive_table.db) are particularly interesting. Note that using such an attribute, we can define arbitrary relationships between two types defined in Atlas and thus build rich models. Note that one can also collect a list of references as an attribute type (e.g. hive_table.columns which represents a list of references from hive_table to hive_column type)
## Entities
An entity in Atlas is a specific value or instance of an Entity type and thus represents a specific metadata object in the real world. Referring back to our analogy of Object Oriented Programming languages, an instance is anObject of a certain Class.
An entity in Atlas is a specific value or instance of an Entity type and thus represents a specific metadata object in the real world. Referring back to our analogy of Object-Oriented Programming languages, an instance is anObject of a certain Class.
An example of an entity will be a specific Hive Table. Say Hive has a table called customers in the defaultdatabase. This table will be an entity in Atlas of type hive_table. By virtue of being an instance of an entity type, it will have values for every attribute that are a part of the Hive table type, such as:
@ -103,7 +103,7 @@ values:
The following points can be noted from the example above:
* Every instance ofan entity type is identified by a unique identifier, a GUID. This GUID is generated by the Atlas server when the object is defined, and remains constant for the entire lifetime of the entity. At any point in time, this particular entity can be accessed using its GUID.
* Every instance of an entity type is identified by a unique identifier, a GUID. This GUID is generated by the Atlas server when the object is defined, and remains constant for the entire lifetime of the entity. At any point in time, this particular entity can be accessed using its GUID.
* In this example, the customers table in the default database is uniquely identified by the GUID "9ba387dd-fa76-429c-b791-ffc338d3c91f"
* An entity is of a given type, and the name of the type is provided with the entity definition.
* In this example, the customers table is a hive_table.
@ -114,7 +114,7 @@ With this idea on entities, we can now see the difference between Entity and Str
## Attributes
We already saw that attributes are defined inside metatypes like Entity, Struct, Classification and Relationship. But we
implistically referred to attributes as having a name and a metatype value. However, attributes in Atlas have some more
implicitly referred to attributes as having a name and a metatype value. However, attributes in Atlas have some more
properties that define more concepts related to the type system.
An attribute has the following properties:
@ -133,13 +133,13 @@ The properties above have the following meanings:
* name - the name of the attribute
* dataTypeName - the metatype name of the attribute (native, collection or composite)
* isComposite -
* This flag indicates an aspect of modelling. If an attribute is defined as composite, it means that it cannot have a lifecycle independent of the entity it is contained in. A good example of this concept is the set of columns that make a part of a hive table. Since the columns do not have meaning outside of the hive table, they are defined as composite attributes.
* This flag indicates an aspect of modelling. If an attribute is defined as composite, it means that it cannot have a lifecycle independent of the entity it is contained in. A good example of this concept is the set of columns that make a part of a hive table. Since the columns do not have meaning outside the hive table, they are defined as composite attributes.
* A composite attribute must be created in Atlas along with the entity it is contained in. i.e. A hive column must be created along with the hive table.
* isIndexable -
* This flag indicates whether this property should be indexed on, so that look ups can be performed using the attribute value as a predicate and can be performed efficiently.
* This flag indicates whether this property should be indexed on, so that look-ups can be performed using the attribute value as a predicate and can be performed efficiently.
* isUnique -
* This flag is again related to indexing. If specified to be unique, it means that a special index is created for this attribute in JanusGraph that allows for equality based look ups.
* Any attribute with a true value for this flag is treated like a primary key to distinguish this entity from other entities. Hence care should be taken ensure that this attribute does model a unique property in real world.
* This flag is again related to indexing. If specified to be unique, it means that a special index is created for this attribute in JanusGraph that allows for equality based look-ups.
* Any attribute with a true value for this flag is treated like a primary key to distinguish this entity from other entities. Hence, care should be taken ensure that this attribute does model a unique property in real world.
* For e.g. consider the name attribute of a hive_table. In isolation, a name is not a unique attribute for a hive_table, because tables with the same name can exist in multiple databases. Even a pair of (database name, table name) is not unique if Atlas is storing metadata of hive tables amongst multiple clusters. Only a cluster location, database name and table name can be deemed unique in the physical world.
* multiplicity - indicates whether this attribute is required, optional, or could be multi-valued. If an entitys definition of the attribute value does not match the multiplicity declaration in the type definition, this would be a constraint violation and the entity addition will fail. This field can therefore be used to define some constraints on the metadata information.
@ -174,7 +174,7 @@ Note the “isOptional=true” constraint - a table entity cannot be created wit
always be bound to the table entity they are defined with.
From this description and examples, you will be able to realize that attribute definitions can be used to influence
specific modelling behavior (constraints, indexing, etc) to be enforced by the Atlas system.
specific modelling behavior (constraints, indexing, etc.) to be enforced by the Atlas system.
## System specific types and their significance
Atlas comes with a few pre-defined system types. We saw one example (DataSet) in preceding sections. In this
@ -193,14 +193,14 @@ make convention based assumptions about what attributes they can expect of types
**Infrastructure**: This type extends Asset and typically can be used to be a common super type for infrastructural
metadata objects like clusters, hosts etc.
**DataSet**: This type extends Referenceable. Conceptually, it can be used to represent an type that stores data. In Atlas,
hive tables, hbase_tables etc are all types that extend from DataSet. Types that extend DataSet can be expected to have
**DataSet**: This type extends Referenceable. Conceptually, it can be used to represent a type that stores data. In Atlas,
hive tables, hbase_tables etc. are all types that extend from DataSet. Types that extend DataSet can be expected to have
a Schema in the sense that they would have an attribute that defines attributes of that dataset. For e.g. the columns
attribute in a hive_table. Also entities of types that extend DataSet participate in data transformation and this
attribute in a hive_table. Also, entities of types that extend DataSet participate in data transformation and this
transformation can be captured by Atlas via lineage (or provenance) graphs.
**Process**: This type extends Asset. Conceptually, it can be used to represent any data transformation operation. For
example, an ETL process that transforms a hive table with raw data to another hive table that stores some aggregate can
be a specific type that extends the Process type. A Process type has two specific attributes, inputs and outputs. Both
inputs and outputs are arrays of DataSet entities. Thus an instance of a Process type can use these inputs and outputs
inputs and outputs are arrays of DataSet entities. Thus, an instance of a Process type can use these inputs and outputs
to capture how the lineage of a DataSet evolves.

View File

@ -25,7 +25,7 @@ submenu: Whats New
* Notification processing to support batch-commits
* New option in notification processing to ignore potentially incorrect hive_column_lineage
* Updated Hive hook to avoid duplicate column-lineage entities; also updated Atlas server to skip duplicate column-lineage entities
* Improved batch processing in notificaiton handler to avoid processing of an entity multiple times
* Improved batch processing in notification handler to avoid processing of an entity multiple times
* Add option to ignore/prune metadata for temporary/staging hive tables
* Avoid unnecessary lookup when creating new relationships
* UI Improvements:

View File

@ -17,9 +17,9 @@ submenu: Whats New
## Enhancements
* **Search**: ability to find entities by more than one classification
* **Performance**: improvements in lineage retrieval and classification-propagation
* **Notification**: ability to process notificaitons from multiple Kafka topics
* **Notification**: ability to process notifications from multiple Kafka topics
* **Hive Hook**: tracks process-executions via hive_process_execution entities
* **Hive Hook**: catures DDL operations via hive_db_ddl and hive_table_ddl entities
* **Hive Hook**: captures DDL operations via hive_db_ddl and hive_table_ddl entities
* **Notification**: introduced shell entities to record references to non-existing entities in notifications
* **Spark**: added model to capture Spark entities, processes and relationships
* **AWS S3**: introduced updated model to capture AWS S3 entities and relationships