REST API Gateway for the Apache Hadoop Ecosystem

The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop clusters.

The Knox Gateway provides a single access point for all REST interactions with Apache Hadoop clusters.

In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control,
integration, monitoring and automation of critical administrative and analytical needs of the enterprise.

  • Authentication (LDAP and Active Directory Authentication Provider)
  • Federation/SSO (HTTP Header Based Identity Federation)
  • Authorization (Service Level Authorization)
  • Auditing

While there are a number of benefits for unsecured Apache Hadoop clusters,
the Knox Gateway also complements the kerberos secured cluster quite nicely.

Coupled with proper network isolation of a Kerberos secured Apache Hadoop cluster,
the Knox Gateway provides the enterprise with a solution that:

  • Integrates well with enterprise identity management solutions
  • Protects the details of the cluster deployment (hosts and ports are hidden from endusers)
  • Simplifies the number of services that clients need to interact with

alt text

Overview

The Knox API Gateway is designed as a reverse proxy with consideration for pluggability in the areas of
policy enforcement, through providers and the backend services for which it proxies requests.

Policy enforcement ranges from authentication/federation, authorization, audit, dispatch, hostmapping
and content rewrite rules. Policy is enforced through a chain of providers that are defined within the topology
deployment descriptor for each Apache Hadoop cluster gated by Knox. The cluster definition is also defined
within the topology deployment descriptor and provides the Knox Gateway with the layout of the
cluster for purposes of routing and translation between user facing URLs and cluster internals.

Each Apache Hadoop cluster that is protected by Knox has its set of REST APIs represented by a single cluster specific
application context path. This allows the Knox Gateway to both protect multiple clusters and present
the REST API consumer with a single endpoint for access to all of the services required, across the
multiple clusters.

Simply by writing a topology deployment descriptor to the topologies directory of the Knox installation, a
new Apache Hadoop cluster definition is processed, the policy enforcement providers are configured and the application
context path is made available for use by API consumers.

Supported Apache Hadoop Services

The following Apache Hadoop services have integrations with the Knox Gateway:

Ambari
WebHDFS (HDFS)
Templeton (HCatalog)
Stargate (HBase)
Oozie
Hive/JDBC
Yarn RM
Storm

Supported Apache Hadoop UIs

Name Node UI
Job History UI
Oozie UI
HBase UI
Yarn UI
Spark UI
Ambari UI
Ranger Admin Console

Configuring Support for new services and UIs

Apache Knox provides a configuration driven method of adding new routing services.
This enables for new Apache Hadoop REST APIs to come on board very quickly and easily. It also enables
users and developers to add support for custom REST APIs to the Knox gateway as well.
This capability was added in release 0.6.0 and furthers the Knox commitment to extensibility and integration.

Authentication

Providers with the role of authentication are responsible for collecting credentials presented by the API
consumer, validating them and communicating the successful or failed authentication to the client or the
rest of the provider chain.

Out of the box, the Knox Gateway provides the Shiro authentication provider. This is a provider that leverages
the Apache Shiro project for authenticating BASIC credentials against an LDAP user store. There is support for
OpenLDAP, ApacheDS and Microsoft Active Directory.

Federation/SSO

For customers that require credentials to be presented to a limited set of trusted entities within the enterprise,
the Knox Gateway may be configured to federate the authenticated identity from an external authentication event.
This is done through providers with the role of federation. The set of out-of-the-box federation providers include:

KnoxSSO Default Form-based IDP -

The default configuration of KnoxSSO provides a form-based authentication mechanism that leverages the Shiro authentication
to authenticate against LDAP/AD with credentials collected from a form-based challenge.

Pac4J -

The pac4j provider adds numerous authentication and federation capabilities including: SAML, CAS, OpenID Connect, Google,
Twitter, etc.

HeaderPreAuth -

A simple mechanism for propagating the identity through HTTP Headers that specify the username and group for the
authenticated user. This has been built with vendor usecases such as SiteMinder and IBM Tivoli Access Manager.

KnoxSSO

The KnoxSSO service is an integration service that provides a normalized SSO token for representing the authenticated user.
This token is generally used for WebSSO capabilities for participating UIs and their consumption of the Apache Hadoop REST APIs.
KnoxSSO abstracts the actual identity provider integration away from participating applications so that they only need to
be aware of the KnoxSSO cookie. The token is presented by the browser as a cookie and applications that are participating in
the KnoxSSO integration are able to cryptographically validate the presented token and remain agnostic to the underlying
SSO integration.

Authorization

The authorization role is used by providers that make access decisions for the requested resources based on the
effective user identity context. This identity context is determined by the authentication provider and the identity
assertion provider mapping rules. Evaluation of the identity context’s user and group principals against a set of
access policies is done by the authorization provider in order to determine whether access should be granted to
the effective user for the requested resource.

Out of the box, the Knox Gateway provides an ACL based authorization provider that evaluates rules that comprise
of username, groups and ip addresses. These ACLs are bound to and protect resources at the service level.
That is, they protect access to the Apache Hadoop services themselves based on user, group and remote ip address.

Audit

The ability to determine what actions were taken by whom during some period of time is provided by the auditing
capabilities of the Knox Gateway. The facility is built on an extension of the Log4j framework and may be extended
by replacing the out of the box implementation with another.