REST API Gateway for the Hadoop Ecosystem

The Apache Knox Gateway is a REST API Gateway for interacting with Hadoop clusters.

The Knox Gateway provides a single access point for all REST interactions with Hadoop clusters.

In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control,
integration, monitoring and automation of critical administrative and analytical needs of the enterprise.

  • Authentication (LDAP and Active Directory Authentication Provider)
  • Federation/SSO (HTTP Header Based Identity Federation)
  • Authorization (Service Level Authorization)
  • Auditing

While there are a number of benefits for unsecured Hadoop clusters,
the Knox Gateway also complements the kerberos secured cluster quite nicely.

Coupled with proper network isolation of a Kerberos secured Hadoop cluster,
the Knox Gateway provides the enterprise with a solution that:

  • Integrates well with enterprise identity management solutions
  • Protects the details of the Hadoop cluster deployment (hosts and ports are hidden from endusers)
  • Simplifies the number of services that clients need to interact with

alt text


The Knox API Gateway is designed as a reverse proxy with consideration for pluggability in the areas of
policy enforcement, through providers and the backend services for which it proxies requests.

Policy enforcement ranges from authentication/federation, authorization, audit, dispatch, hostmapping
and content rewrite rules. Policy is enforced through a chain of providers that are defined within the topology
deployment descriptor for each Hadoop cluster gated by Knox. The cluster definition is also defined
within the topology deployment descriptor and provides the Knox Gateway with the layout of the Hadoop
cluster for purposes of routing and translation between user facing URLs and Hadoop cluster internals.

Each Hadoop cluster that is protected by Knox has its set of REST APIs represented by a single cluster specific
application context path. This allows the Knox Gateway to both protect multiple Hadoop clusters and present
the REST API consumer with a single endpoint for access to all of the Hadoop services required, across the
multiple clusters.

Simply by writing a topology deployment descriptor to the topologies directory of the Knox installation, a
new Hadoop cluster definition is processed, the policy enforcement providers are configured and the application
context path is made available for use by API consumers.

Supported Hadoop Services

The following Hadoop services have integrations with the Knox Gateway:

Templeton (HCatalog)
Stargate (HBase)
Yarn RM

Apache Knox provides a configuration driven method of adding new routing services.
This enables for new Hadoop REST APIs to come on board very quickly and easily. It also enables
users and developers to add support for custom REST APIs to the Knox gateway as well.
This capability was added in release 0.6.0 and furthers the Knox commitment to extensibility and integration.


Providers with the role of authentication are responsible for collecting credentials presented by the API
consumer, validating them and communicating the successful or failed authentication to the client or the
rest of the provider chain.

Out of the box, the Knox Gateway provides the Shiro authentication provider. This is a provider that leverages
the Apache Shiro project for authenticating BASIC credentials against an LDAP user store. There is support for
OpenLDAP, ApacheDS and Microsoft Active Directory.


For customers that require credentials to be presented to a limited set of trusted entities within the enterprise,
the Knox Gateway may be configured to federate the authenticated identity from an external authentication event.
This is done through providers with the role of federation. The out of the box federation provider is a simple
mechanism for propagating the identity through HTTP Headers that specify the username and group for the authenticated
user. This has been built with vendor usecases such as SiteMinder and IBM Tivoli Access Manager.


The authorization role is used by providers that make access decisions for the requested resources based on the
effective user identity context. This identity context is determined by the authentication provider and the identity
assertion provider mapping rules. Evaluation of the identity context’s user and group principals against a set of
access policies is done by the authorization provider in order to determine whether access should be granted to
the effective user for the requested resource.

Out of the box, the Knox Gateway provides an ACL based authorization provider that evaluates rules that comprise
of username, groups and ip addresses. These ACLs are bound to and protect resources at the service level.
That is, they protect access to the Hadoop services themselves based on user, group and remote ip address.


The ability to determine what actions were taken by whom during some period of time is provided by the auditing
capabilities of the Knox Gateway. The facility is built on an extension of the Log4j framework and may be extended
by replacing the out of the box implementation with another.