Managing Quarryville Water Network Data

The following workflow can be tailored for many situations where we'd like to combine asset data from several shapefiles and assign attributes to a single output feature class. We'll look at the existing data, do a transformation, and create metadata for our new feature class.

Inspect current data

Water network data received from a previously contracted engineering firm consisted of several shapefiles corresponding to different asset types, eg. valves, hydrants, mains of differing diameters/materials. Lets look at these to see if we can make the data easier to manage going forward.

First, let's see what shapefiles we have:

In [4]:
import os, glob
qwDir = 'd:/GISPortable/SourceData/QuarryvilleSourceData/ArroEngWaterData'
os.chdir(qwDir)
print('Directory:\n  ' + os.getcwd())
print("Shapefiles in 'waterShapes' below:")
waterShapes = glob.glob('*.shp')
print(waterShapes)
Directory:
  d:\GISPortable\SourceData\QuarryvilleSourceData\ArroEngWaterData
Shapefiles in 'waterShapes' below:
['WATER 10 in PAWC.shp', 'WATER 10 in.shp', 'WATER 2 in CI.shp', 'WATER 2 in.shp', 'WATER 4 in AC.shp', 'WATER 4 in CI.shp', 'WATER 4 in DI.shp', 'WATER 6 in AC.shp', 'WATER 6 in DICL.shp', 'WATER 6 in PVC.shp', 'WATER 8 in CI.shp', 'WATER 8 in DICL.shp', 'WATER FIXTURES.shp', 'WATER HYDRANTS.shp', 'WATER PRIVATE.shp', 'WATER TANKS.shp', 'WATER VALVES.shp']

Next, a concise summary of these shapefiles type and attributes...

In [5]:
import arcpy
for thisShape in waterShapes:
    shpDesc = arcpy.da.Describe(thisShape)
    geomType = shpDesc['shapeType']
    crsName = shpDesc['spatialReference'].name
    print('{} type: {} crs: {}'.format(thisShape, geomType, crsName))
    sys.stdout.write('  Fields: ')
    for thisField in shpDesc['fields']:
        sys.stdout.write(thisField.name + ' ')
    print()
    
WATER 10 in PAWC.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 10 in.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 2 in CI.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 2 in.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 4 in AC.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 4 in CI.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 4 in DI.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 6 in AC.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 6 in DICL.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 6 in PVC.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 8 in CI.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER 8 in DICL.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER FIXTURES.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER HYDRANTS.shp type: Point crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER PRIVATE.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER TANKS.shp type: Polyline crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 
WATER VALVES.shp type: Point crs: NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet
  Fields: FID Shape 

Above output confirms that all these asset shapefiles have geometry only, no fields have been added for attribute entry.

Data transformation: combine water mains shapefiles

We can combine shape files for different types of water mains into one feature class and store size/material info as attributes. If we want to show a subset such as asbestos cement pipes as a separate layer, we can still do that using definition queries. Combining mains into one feature class will simplify adding additional attributes to our data as needed. We could create this new feature class in a geodatabase, but in this case will prioritize cross platform portability and just create a new shapefile instead.

Run following cell to clear output+temp. data for workflow testing

In [ ]:
import os, arcpy
newMainShp = 'WaterMains.shp'
tempPipes = r'in_memory/tempPipes'
qwDir = 'd:/GISPortable/SourceData/QuarryvilleSourceData/ArroEngWaterData'
os.chdir(qwDir)
arcpy.management.Delete(newMainShp)
arcpy.management.Delete(tempPipes)

Create list of mains shapefiles (exclude hydrants etc)

In [7]:
# Create a list of the shapefiles for water mains
mainShapes = [ thisShp for thisShp in waterShapes if ' in' in thisShp ]
# all mains shapefiles have ' in' (inch) in name       ^
# except for the one below ...
mainShapes += [ 'WATER PRIVATE.shp' ]
print(mainShapes)
['WATER 10 in PAWC.shp', 'WATER 10 in.shp', 'WATER 2 in CI.shp', 'WATER 2 in.shp', 'WATER 4 in AC.shp', 'WATER 4 in CI.shp', 'WATER 4 in DI.shp', 'WATER 6 in AC.shp', 'WATER 6 in DICL.shp', 'WATER 6 in PVC.shp', 'WATER 8 in CI.shp', 'WATER 8 in DICL.shp', 'WATER PRIVATE.shp']

Create new output shapefile for mains

In 'Inspect current data' above we saw that all the mains shapefiles are 'Polyline', with the same coordinate system. So we can arbitrarily choose the first in the list 'mainShapes' as a template to create our new shapefile.

In [8]:
shpDesc = arcpy.da.Describe(mainShapes[0]) # to get geometry type and crs
newMainShp = 'WaterMains.shp'
# create our output feature                   v-- defined/used in previous cell
result = arcpy.CreateFeatureclass_management(qwDir, newMainShp,
    geometry_type=shpDesc['shapeType'],
    spatial_reference=shpDesc['spatialReference'] )
# note: we don't use 'template' parameter, that is only for copying *attributes* from our 'template'
#       and the existing shapefiles don't have any user entered (non-ID) attributes

# now add attributes to new, empty feature class
fieldsToAdd = [
    ['Material', 'TEXT', None, 20],
    ['Diameter', 'TEXT', None, 20],
    ['Owner', 'TEXT', None, 20],
]
result = arcpy.management.AddFields(newMainShp, fieldsToAdd)

Process input shapefiles, calculate attributes and populate output shapefile

In [9]:
arcpy.env.workspace = qwDir
tempPipes = r'in_memory/tempPipes'
pipeTypes = [ 'CI', 'DI', 'DICL', 'AC', 'PVC' ]
for thisShp in mainShapes:
    print('Processing ' + thisShp)
    shpNameParts = thisShp.replace('.',' ').split(' ')
    diam = ''
    material = ''
    owner = ''
    for thisPart in shpNameParts:
        if thisPart.isdigit():
            diam = thisPart
        if thisPart in pipeTypes:
            material = "'{}'".format(thisPart)
        if thisPart in [ 'PRIVATE', 'PAWC' ]:
            owner = "'{}'".format(thisPart)

    print('  Pipe diameter: {} Material: {} Owner: {}'
      .format(diam, material, owner))
    arcpy.management.CopyFeatures(thisShp, tempPipes)
    result = arcpy.management.AddFields(tempPipes, fieldsToAdd)
    arcpy.management.CalculateFields(tempPipes, 'PYTHON3', fields=[
        ['Diameter', diam ],
        ['Material', material ],
        ['Owner', owner ],
    ])

    arcpy.management.Append(tempPipes, newMainShp, schema_type='NO_TEST')
    arcpy.management.Delete(tempPipes)
Processing WATER 10 in PAWC.shp
  Pipe diameter: 10 Material:  Owner: 'PAWC'
Processing WATER 10 in.shp
  Pipe diameter: 10 Material:  Owner: 
Processing WATER 2 in CI.shp
  Pipe diameter: 2 Material: 'CI' Owner: 
Processing WATER 2 in.shp
  Pipe diameter: 2 Material:  Owner: 
Processing WATER 4 in AC.shp
  Pipe diameter: 4 Material: 'AC' Owner: 
Processing WATER 4 in CI.shp
  Pipe diameter: 4 Material: 'CI' Owner: 
Processing WATER 4 in DI.shp
  Pipe diameter: 4 Material: 'DI' Owner: 
Processing WATER 6 in AC.shp
  Pipe diameter: 6 Material: 'AC' Owner: 
Processing WATER 6 in DICL.shp
  Pipe diameter: 6 Material: 'DICL' Owner: 
Processing WATER 6 in PVC.shp
  Pipe diameter: 6 Material: 'PVC' Owner: 
Processing WATER 8 in CI.shp
  Pipe diameter: 8 Material: 'CI' Owner: 
Processing WATER 8 in DICL.shp
  Pipe diameter: 8 Material: 'DICL' Owner: 
Processing WATER PRIVATE.shp
  Pipe diameter:  Material:  Owner: 'PRIVATE'

Check our new combined water main shapefile

We can just add our new water main shapefile to a map in ArcGIS and check the attributes there, but I'll use one of my existing Python scripts to get a quick summary below.

In [10]:
sys.path.append('c:/Users/dwm/MyPython')
from dmtdev.agprotools import datasetView
datasetView(newMainShp,['Material','Diameter','Owner'])
Out[10]:
[(' ', '10', 'PAWC'),
 (' ', '10', ' '),
 ('CI', '2', ' '),
 (' ', '2', ' '),
 ('AC', '4', ' '),
 ('AC', '4', ' '),
 ('AC', '4', ' '),
 ('CI', '4', ' '),
 ('DI', '4', ' '),
 ('DI', '4', ' '),
 ('DI', '4', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('AC', '6', ' '),
 ('DICL', '6', ' '),
 ('DICL', '6', ' '),
 ('DICL', '6', ' '),
 ('DICL', '6', ' '),
 ('DICL', '6', ' '),
 ('DICL', '6', ' '),
 ('PVC', '6', ' '),
 ('CI', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 ('DICL', '8', ' '),
 (' ', ' ', 'PRIVATE')]

Note that shapefile name for the single 'PRIVATE' water main feature didn't give dimension or material info. Material also unspecified for three of the features at start of list, including the PA American Water Co main. Most of the mains are 6" asbestos cement and 8" ductile iron cement lined.

Add metadata to new shapefile

In [11]:
mainMeta = arcpy.metadata.Metadata(newMainShp)
mainMeta.title = 'Quarryville Water Mains'
mainMeta.summary = 'Water Mains in Quarryville PA for illustrative purposes only'
metaDescString = '''
This shapefile was created by aggregating data from source shapefiles
supplied by Arro Engineering (contact Darrell Becker)
Those files were used by Arro to create water network map in March 2018.
Source shapefiles are polyline geometry w/o additional attributes,
filenames indicate size & material and/or ownership.
Source filenames follow:
'''
for thisShp in mainShapes:
    metaDescString += thisShp + '\n'
mainMeta.description = metaDescString
mainMeta.save()