Delivery of data from back-end systems to the NYC OpenData portal will take place though an architecture which permits de-coupling and enables a layer of abstraction. This architecture leverages infrastructural investments and technical capabilities that either already exist within Agencies or would require minimal effort to implement.
A de-coupled architecture provides the following benefits:
Figure 1 – Logical Integration Architecture
ETL: Extract-Transform-Load – a process by which data is extracted from a source system, manipulated as required by business rules, and then loaded into another system.
OLAP: Online Analytical Processing data storage – typically used for reporting and data mining capabilities. Data sets are typically de-normalized, and information therein may be routinely extracted from OLTP systems.
OLTP: Online Transactional Processing data storage – typically used for data entry and retrieval. Data sets are typically highly normalized, and information therein may be routinely extracted, transformed, and loaded into OLAP systems.
Data sets may contain the following data types:
Please refer to the NYC OpenData portal’s import specifications for details on formatting and parsing of the above data types.
Data values must not contain elements or markup used for presentation, nor should they contain interpreted or raw application source code. For example, HTML formatting tags such as <script>, <table>, <tr>, <td>, or <br> are not permitted.
Geospatial data must be published in the Web Mercator coordinate system (WGS 84/EPSG:3857) to make the data easy to use with popular online mapping services. Although this is the most useful coordinate system for web-based mapping, Web Mercator as a geographic coordinate system is not a projection, and therefore the measurement of distance and area on such data will not be as accurate as a projected coordinate system.
Agencies may also make their data available in the New York State Plane, Long Island Zone, coordinate system (FIPS Zone 3104/EPSG:2263). If the data is hosted directly through the OpenData platform, it will be automatically converted to Web Mercator.
Data sets providing information on location in tabular format can be automatically geocoded by the OpenData platform.
For each data set published, the providing Agency must, at a minimum, provide values for all of the metadata elements as defined in the latest version of the DublinCore Metadata Element Set. In addition, the Agency must provide the metadata element” frequency” which must correspond to a value contained in the DublinCore Collection Description Frequency Vocabulary. The following table represents a list of required metadata elements for data sets as of the publication of this technical standard:
Label | Description | Permitted Values (if applicable) |
---|---|---|
Contributor | Indicates the agency that supplied the data. | |
Coverage | Indicates the range of data from either a temporal or spatial perspective. | |
Creator | Indicates the agency that supplied the data. | |
Date | Auto-generated by Socrata when data set (or metadata) is modified. | |
Description | A brief description of the data set. | |
Format | Dependent upon export methodology. Refer to Public Standards for more information. | |
Frequency | Indicates the rate at which the information in the data set is updated. | Not updated [historical only] |
Identifier | Socrata uses a 9-digit identifier (usually xxxx-xxxx); may have the option for better permalinking under the "resource name" field. | |
Language | Language of the data set. Assumed to be en-US for all data sets. Exceptions must be noted. | en-US |
Publisher | Entity that is responsible for publishing the data; this will always be the City of New York. | City of New York |
Relation | Not used. | |
Rights | NYC data sets should be attributed to the City. Refer to Public Policies. | |
Source | Identifies the name of the source system within the City. | |
Subject | Comma-separated list of nouns describing the content of the data set. | |
Title | The brief descriptive name of the data set. | |
Type | The category of the data set identified by the list of possible values. If a data set can fall into multiple categories, select the one which is most significant. This list will be subject to change on an ongoing basis. | Business and Economic |
Although metadata for columns within a data set is not required, it should be provided when the column identifiers do not provide a user with enough information to use it effectively. For example, the metadata for a column containing restaurant inspection letter grades should indicate the possible values and their meanings.
An Agency should include any preferred citation for a data set in the data set’s metadata or supporting documentation.
The Agency ODC should work closely with DoITT during the initial data set publishing process to identify the best technical approach to automate delivery to the public. The following mechanisms are supported:
For Agencies that require DoITT assistance to extract data from back-office systems, the Agency must provide read-only DBMS credentials for the necessary databases, tables, stored procedures, and/or views. The credentials should not permit access to tables, columns, or other entities that contain information that is not included within the definition of public data set because it is exempt from disclosure. If the Agency operates a data warehouse, it should provide access to extract public data sets from the warehouse rather than the source operational system.
Agencies may choose to publish files to a location on the City intranet that DoITT staff or DoITT-managed automation tools can access. Specific details, such as location, formats, naming conventions, and sizing, should be discussed with DoITT.
Agencies may leverage DoITT’s Enterprise Service Bus (DataShare) to publish public data sets. This option may be especially desirable if DataShare already automatically transfers the data set.
In any exceptional case in which transaction volumes, data structure, technical barriers, or resource limits prevent hosting a public data set on the NYC Open Data portal itself, the NYC OpenData portal must provide a direct link to the public data set that is hosted elsewhere so that the data set is accessible to the public through the NYC OpenData portal. In such an exceptional case, an Agency may self-host the relevant public data set, provided that the public data set is accessible to the public through the link on the NYC OpenData portal according to following standards:
Data sets published on the NYC OpenData portal must be maintained for accuracy, timeliness, and accessibility, as set forth below.
Agency ODCs are responsible for identifying an update frequency for each public data set as an element in its data set metadata, and for ensuring that their data set content updates are maintained and published according to the data set’s identified schedule or to the extent that the agency regularly maintains or updates the public data set.
The ODC or Agency liaison must not modify existing data structure during normal updates to the data set. The number of data elements per record, name, format, and order of the data elements must be consistent with the originally-published version. The Agency ODC should notify DoITT prior to any structural changes to data sets.
DoITT will contact the Agency ODC to obtain feedback or a direct answer to comments or inquiries from the public that relate to data set contents or supporting documentation. The Agency will provide DoITT with an expected timeframe to resolve the support inquiry as soon as possible. The Agency must then notify DoITT when the updates or corrections are ready for publication. An Agency that proactively identifies defects or improvements related to its data set content or supporting documentation must notify DoITT prior to publication of any changes.
Agencies retain ownership over the data sets that they submit. All data and data sets remain the property of the originating Agency and public users acquire no ownership rights to Agency data or data sets. The data sets published on NYC.gov or the NYC OpenData portal become a public resource available to anyone with access to the Internet. The public use of the data sets may include development of applications. In this case, the developers retain all intellectual property ownership in their applications, excluding the Agency data itself, whose ownership continues to reside with the Agency.
The Agency that owns the data set is responsible for all aspects of the quality, integrity, and security of the data set contents, as detailed below, and as subject to limitations on liability contained in Local Law 11. Agencies do not relinquish control of their data to DoITT when the data set is submitted for publication on the NYC OpenData portal.
Agencies are responsible for ensuring that all of their submitted data has been reviewed by appropriate Agency management for confidentiality, privacy, security, and all other content limitation issues consistent with Local Law 11 before the data is submitted for publication. The Agency supplying the data is also responsible for maintaining records of information privacy status and public-disclosure requirements. The Agency is responsible for updating its data according to the frequency identified in the data set metadata or to the extent that the agency regularly maintains or updates the public data set.
As the authoritative source of the information, submitting Agencies retain version control of public data sets and must comply with record retention schedules and requirements outlined by the New York City Department of Records and Information Services.
Public data to be made available per Local Law 11 of 2012 does not include any data set to which an Agency may deny access pursuant to the Freedom of Information Law (FOIL) or any other provision of a federal or state law, rule or regulation or local law. (That notwithstanding, by itself, Local Law 11 does not prohibit Agencies from releasing such FOIL-deniable data.) Records deniable under FOIL are those that: (a) are specifically exempted from disclosure by state or federal statute; (b) if disclosed would result in an unwarranted invasion of personal privacy; (c) if disclosed would impair present or imminent contract awards or collective bargaining negotiations; (d) are trade secrets or are submitted to an agency by a commercial enterprise or derived from information obtained from a commercial enterprise and which if disclosed would cause substantial injury to the competitive position of the subject enterprise; (e) are compiled for law enforcement purposes and which if disclosed would: i. interfere with law enforcement investigations or judicial proceedings; ii. deprive a person of a right to a fair trial or impartial adjudication; iii. identify a confidential source or disclose confidential information relative to a criminal investigation; or iv. reveal criminal investigative techniques or procedures, except routine techniques and procedures; (f) could if disclosed endanger the life or safety of any person; (g) are inter-agency or intra-agency communications, except to the extent that such materials consist of: i. statistical or factual tabulations or data; ii. instructions to staff that affect the public; iii. final agency policy or determinations; or iv. external audits, including but not limited to audits performed by the comptroller and the federal government; (h) are examination questions or answers that are requested prior to the final administration of such questions; (i) if disclosed, would jeopardize an agency’s capacity to guarantee the security of its information technology assets, such assets encompassing both electronic information systems and infrastructures; (j) are photographs, microphotographs, videotape or other recorded images prepared under authority of section eleven hundred eleven-a of the vehicle and traffic law (this exemption will be repealed effective December 1, 2014); (k) are photographs, microphotographs, videotape or other recorded images prepared under authority of section eleven hundred eleven-b of the vehicle and traffic law (this exemption will be repealed effective December 1, 2014); or (l) are photographs, microphotographs, videotape or other recorded images produced by a bus lane photo device prepared under authority of section eleven hundred eleven-c of the vehicle and traffic law (this exemption will be repealed effective September 20, 2015). For subparagraphs (j) through (l) above, such information must be included on the date such subparagraphs will be repealed.
Local Law 11 specifies the following additional exemptions:
Nothing in the legislation, policies, or standards shall be deemed to prohibit an Agency from voluntarily disclosing information not otherwise defined as a public data set, nor shall it be deemed to prohibit an agency from making such voluntarily disclosed information accessible through the NYC OpenData portal.
By using this site, you agree to comply with the NYC Social Media Customer Use Policy ("CUP") and that your comments, edits or other content that you create that fails to comply with the CUP may be deleted or removed by the site administrators without further notice to you. The contents of this site, whether they originate with the City or with registered users, is intended to be freely accessible and available for the purposes of collaborating to develop the City's open data policies and technical standards. No content appearing on this site, whether contributed by the City or by users, will be subject to copyright or other intellectual property protections. Only users who agree to these terms should register or use this site. If you have questions or concerns, please contact us.
Accept