Interview Transcript

Speaker 0:43
Yeah, certainly. So let's see, it's, it's kind of a long, kind of a long road there. I mean, just in terms, it's not, you know, if you knew me as a child, you wouldn't be surprised I work with computers now. I've always very interested in computers and some from from a very early age programming and electronics and ham radio, and all sorts of stuff like that. My college, you know, originally, I was an English major actually. And then I switched that switch to biology. And I worked in a biology lab for a couple of years and realized that, you know, without like, that's not really what I wanted to do with my life. And, you know, I wanted to, I was more interested in sort of like the outdoors and environmental science, which was part of my degree. So I learned a little bit of GIS just kind of around and studied a little bit, and I got an internship through AmeriCorps, and Student Conservation Association with the New York City Department of Environmental Protection. And they needed someone to do kind of GIS work for them. And I got it. And so this was based in Catskill, upstate New York, kind of Hudson Valley where New York City's drinking water comes from. And I work with this team. And they hired me on full time eventually. And I sort of moved into doing kind of software development there. They had some sort of QA custom software that worked with their GIS system to collect data. And I worked on that. And then after a while, I was like, you know, I want to do something else. And I saw this job listing, I was looking for a position in doing GIS work just sort of in general. And might. It needed to be near near New York City, because that's where my, my partner works, or worked. And I found this job at Princeton, I was like, Hey, I think I could do that. And then it kind of developed, right, so there was some software development, some kind of GIS, and then over the years has become much more about software development, right? I don't really do too much gi stuff anymore. But that's that's kind of how I ended up here. And I had no idea about working in libraries, or any of that sort of learned learned as I've gone along.

Speaker 3:21
Yeah. So I guess, in your typical, I guess, technical day in your life, do you find yourself kind of more working with the data? And I guess, when we're speaking about design, are you more thinking about like, I guess what the best way is to display that data? I guess? What is your process kind of look like? Or what tools do you typically find yourself using?

Speaker 3:46
Yeah, you know, the design end of things? Yeah, this is the kind of tough question because I do us. You know, I do a lot of backend work. Right. So I think I have a pretty good a pretty good eye, you know, my wife said and an artist, my, my dad is actually an artist too. So I kind of grew up. In an architect's, I kind of grew up designing things. So a typical day, I don't necessarily do a lot of like, kind of design work. But, you know, since we're web, do web development, we do like think a lot about how to present information. And you know, I may not, I think I have some good ideas about how to do it. Sometimes we'll we'll do like mock ups. Right. So and we've kind of changed our toolset over the years on that. You can sometimes do just like a Google drawing. You just kind of outline the shapes. haven't actually done it in a while. There's a couple tools we use that I actually can't remember the names of, but then there's some like kind of online UI mock up tools. So especially if we're proposing like a new feature, right, like there's a new form or a new screen or something or like a really like big redesign of something that will, will present will do a mock up and then kind of present it to each other on the team and say, Oh, that's not going to work, or that is going to work. Oh, yeah, we do a lot of screenshots, actually. So we'll, we'll even like, in fact, we were doing it this week with, I was doing that with my, my student that I'm working with my mentor for this year. He was he was working on our maps portal, actually. And we're trying to figure out like, how our homescreen can, there was some feedback about changing the way the buttons are arranged, and how they look and text and stuff. So we were trying a few different things. And so he, he did it up in the code, because it's actually pretty easy to kind of make some UI changes in the code. And then on his develop machine, just take a little screenshot and did that a couple times, and presented it to the group and got some feedback. And now we have some feedback that, you know, this didn't work. And this worked. And so next week, when I can I kind of go back and incorporate that we do user testing to sorry, interrupt me at any time, if I'm just No, this sounds good. Okay, we do user testing. So a couple months ago, we did, I can't remember exactly when but not too long ago, we did a whole round of of user testing on our maps portal. And actually, Shawn could talk a little bit more about that, since he was the one that organized it, but they basically created some, like research questions like a script, essentially, like, Okay, we want you to do this task, go find this data in our maps portal website. And they he booked an appointment, and watched as the participants like, kind of navigated through that process, and then took notes, and then kind of did a debrief at the end with them and said, hey, you know, What, did you notice? As you're going through this? Did you actually complete the problem? The, you know, the task, you know, what were the roadblocks, and that got translated those that those series of documents, we reviewed them, and then we kind of try to distill them down into GitHub issues on our repository that we could work on over time. Right, and sort of prioritized, you know. And there are quite a few, you know, like about, there's just sort of, like, the overall layout of the site, you know, can be a little confusing sometimes, or the math is too big, or what does this do? Or like, you know, the labeling is strange, and they don't understand, or there's jargon there, you know, like, that was kind of where we're working this week, like, what does that phrase mean? And who our users are they technical specialists, are they just sort of the general public or both, and just try to incorporate that. So that's kind of like a new thing that we're doing a lot more of. We have done that some over time. But more in the last six months to a year, we've really like put a focus on on doing user testing and things like that for applications.

Speaker 8:23
That's awesome. So I guess what I'm wondering is, so you're kind of like on geospatial data. So like, let's say, like a client or I guess Princeton or I guess, whoever, I guess, would kind of, I guess, give you guys sort of a problem that like, I want to have a sort of like this map added, I guess what is that process of like getting that map uploaded look like? And then I guess Whoa,

Speaker 8:48
yeah. Interesting. So for us, most of that comes from. So you know, we're kind of like the support team, right. So we develop the systems, I can kind of show you what that looks like if that would be helpful. Let me share my screen. All right. Actually, let me do this on a staging site. Well, I'll do it here first. So this is our Vicki, which is our digital repository. It's typically admin only. So if you work in the library, or you're a curator or a librarian or a specialist, this, this is where you work to add digital content. And then we have a whole section here for for different types of digital content, right? So scan maps, be just like, you go in the library and you find like a paper map and you scan it in and that's it. And then vector and raster resources are different types of geospatial data that you might use for analysis, sort of like Again, something called a GIS. So if i. So this is kind of self service, right? That's kind of how we've organized it. So we have a whole team of people that scans like old maps. And then we have people, sometimes the same people that like I have collected geospatial data, right. So we have a librarian who collects this data depending on what he thinks, faculty and students at Princeton want. And maybe I get a special request for certain type of data. And then his staff will go in and say he gets this data on like a CD ROM or something or downloads it from the internet, you know, and then the staff will say, Oh, we have a new, say it's a vector, you know, point line polygon. And they go in, and they can enter metadata about the thing. And I'm not going to do this here, because that's our production system. But I'll just give you a sense of what this looks like. Right, so let's say I wanted to add this vector data, right, and I can just say, Elliot, test data. I've been doing this a lot today. So I have six. And let's say I know that this data is in Boston somewhere. Zoom in. So this is a little UI element that we added to help people to kind of choose like a location for for the data uses, it's called a bounding box. And then we want to say that say this is this is going to be a publicly event like anyone could can view and download, this is not restricted data, right. And then it kind of creates a record. And then now I want to upload my file. So there's a couple ways to do it. Typically, they put it on a server somewhere, right? So you can say, there's all this like data out here, right that I could upload. Like, we have all this Cuba data and stuff like that. You could also do some drag and drop for smaller datasets. Let's see. I had one I was working on today. Just second. So this is a test data, right? So I can just drag it and drop it upload. And then it does some stuff in the background, where I try to identify the type of file it is. And it's already done that. And then it creates what we call derivatives. So it makes some copies of the data in different formats. And then it will make for this type like a thumbnail, right. So just make this a little thumbnail from the thing I uploaded. And then what happens is there's like a workflow. And let's just say it's done. Like, everything looks great, you know, we can add more information in here. But it's got a title. And it's got location and the data, that's all we really need. We submit it.

Speaker 13:21
And then if we come over to our maps portal, and we search for this thing, here, so it synchronizes in between. And now, now we have a record for the thing we just created. And it uploads the data into some kind of like a server. So you can actually like preview the data and things like that. So that's kind of like the basics, right? So that's for data and other types of maps are very similar. So people do come to us, and to me, in particular, to like, help solve a problem that they might have. But often, you know, they they are kind of self sufficient, you know, they have errors and things or they're like there's a new type of data that we want to get in here. Then, though, they will, you know, come to us and say there's a new type of data. I can show you what we're working on. Geez, where is it? Yes. So one thing that we have done UI related. So we've been working on this recall what are called mosaics. So we have, so this is the catalog. And this is a set of maps. So these are like paper maps, published by the army map service. And there are about three 250 or so in this one set. And these are all of India And what we did was we kind of said this viewer that we have here as part of the thing that Figgy helps us to generate. So you can kind of look at the individual sheets in the map. But then there was a request for enabling the staff to kind of stitch these maps together into one image that can be overlaid in a geospatial like a GIS. And then to display it also in the catalog. So we had this little this viewer before. And then we needed to find a way to add another type another view of the data. So we ended up doing these little tabs. So if you click over in the map tab, and now you'll see, they're all stitched together. And this is called a raster mosaic, right. So they've cut the we've provided a process for them to kind of cut the edges off the maps to just have the map, you know, anything. It's called the collar. And then they're all integrated. Right? So this was kind of a kind of a big UI thing that we worked on this week, by figuring out the best way to get both of these views of the data onto a page like the catalog. And that was easy and easy to understand for users. We haven't done any user testing on this yet. But I imagine we will, in the future. So I I'm a little I don't know if I've answered your question.

Speaker 16:39
That's very interesting intersection. So I didn't know that you're like that was even possible, I guess, like kind of clip but all of the maps and like, kind of, that's really interesting. So I guess one other question I have for you. So like I noticed, well, usually when you search up a resource, there's like a lot of links. Do you ever find yourself like having to add or remove a link or like thinking about related resources whenever you're hoping to like put something or put in the app? Oh,

Speaker 17:13
interesting. Could you elaborate a little bit more? Like? I think I understand, but maybe clarify a little bit.

Speaker 17:22
Yeah. So for the site that you just pulled up? I saw, I think it was like, under I guess, potentially related topics or authors. I know. Like linking happening. And I was wondering, you were involved with that. And if you ever find yourself having to, I guess, look into how interconnected topics were?

Speaker 17:41
Yeah, certainly. I'm going to share again. So there were two. Yeah, this is. So did you mean in a catalog? Or did you mean in our maps portal here?

Speaker 17:54
Oh, yeah. In the maps portal?

Speaker 17:56
Yeah. Yeah, certainly, that that's an issue. Linking out is. So let me go to the catalog one.

Speaker 18:14
Oh, yeah. I think those were the links I was seeing before actually.

Speaker 18:17
Yeah, yeah, yeah, totally. They do happen. And they do happen in the maps portal as well, just not on this record, because we don't actually have anything else in here, really, besides the title and any metadata that we can link out to? Yeah, so just in the in terms of the catalog, the subject headings here are part of a controlled vocabulary. They're the typically the Library of Congress subject headings. So the Library of Congress basically has this big reference that, you know, if you're a cataloger, right, in your catalogue, in books, or manuscripts, or maps, you know, you're supposed to kind of use the subject headings in order in order to kind of standardize the what goes into subject, right. And that if you standardize that way, that allows us to do things like add, this is called a facet ID. These are faceted search. So if I click on that, right now, I'll get everything in the catalog that's been, you know, with this update the same subject facet, right? So a lot of that is done by the catalogers. Right. So they're, they're the experts at, you know, their area that you know, and they know like what kind of subjects to use for geospatial data. There are different kinds of controlled vocabularies. Depending on what standard that you're you're trying to use to describe your data Right. Well, we could I could spend hours talking about this. But basically, you kind of want to like kind of narrow down the set of potential values. And that makes it easier to link out between between different resources, right. As far as like, how other things are? Yeah, let me just a second. Let me that was our staging site, Miko here. This is our maps portal. And this is a Sandborn. I just want Princeton's. So I was showing you like a set of maps. Let me go. So a lot of these. So really popular set of data that of objects that we have in our portal are these fire insurance maps. We have a pretty complete set from multiple years, the state of New Jersey. And I don't know if you've ever seen these before, but you know, they basically kind of usually from about 1880 to the 1940s or 50s. These are maps. Have you seen these before? I have not seen these? Okay, yeah, so these are great, especially if you're doing research, historical research, or even if you want to, you know, research your own your own house. Right. So these companies put out maps of cities across the country and some other countries outside the United States actually to and therefore use in for insurance companies to use, right, so they're street sections of streets, and then information about the structures that were there at that time. Right. And then you can see things like, there's a church, or maybe there's some businesses, the YWCA, right? These are really used quite a bit in our collection. And they're not just single maps, they're like a set. So they come in like a book, right? So we kind of scan them together. And then they get presented as one object, but they are actually like this one has, how many sheets, I don't know, 12 sheets. So there's like a hierarchical relationship. There's like the book, or the Atlas, let's call it the, you know, the set of them, which is what we're looking at. And then then the individual sheets, right? And this is kind of like, we do this all the time, we're trying to figure out how to relate like individual things to the greater to the sort of the parents objects, we'll call it right, so you can look at the individual sheets. And you can relate them.

Speaker 22:51
Yes, there's a lot. super interesting. Wait, sorry, what were you gonna say? No, go

Speaker 22:55
ahead. Go ahead. Yeah, so

Speaker 22:57
I find this super interesting. So, um, our course is like about graphic design, but like the topic, or I guess, like the title, the second title is like links, and we're like, really exploring how like, I guess, in some ways, like, how do links work? Like how are things in nature connected and structure? So I find it very interesting. Maps to grep, like, first grouped by book, and then from there you like, I guess, further have a hierarchy. So I'm wondering if I wanted to, like look at all the maps, like, no matter, like what type of map or like what type of book like all the maps for like, for a particular area? Do you allow me in that way?

Speaker 23:36
Yeah, so. So here's how you would do it in our portal here. Let's say we have a level. I live in New Orleans. And we can zoom into New Orleans here. So this will do a spatial recall spatial search, right, so it draws a box, and then all of the records in the portal have a location. So we can kind of do a search that says anything that intersects this location, you know, is going to be returned. Right. And then so you can see there's all kinds of different things right. So we have some census data for New Orleans that shows up slowly. What else do we have here? We can pass it by want to make sure it's public so we can see it all. And we can pass it by like the type of thing it is. See if there's anything interesting. This is a good one. Seen this map before? Right, so these are a map from 1916 about to lower Mississippi River. So that you know, that's how you would. That's kind of the advantage of using this interface rather than many of these items are in the catalog. Not all of them, like in our library catalog here. But you can't do like a spatial search. Right? You could certainly type Orleans. Somewhere. Pokey right now. Yeah, we could do New Orleans, and then we could facet by map. And that might get us a few things. Right. So that's something you could do, but it probably wouldn't get you everything that you might get on a spatial search like this. Right? So we're kind of saying things that are overlapping this area. Right. And, you know, I guess these are links, these are links out to record pages that for the most part, they're they're pretty, they're pretty stable. You know, there's they're not, you know, the URL is probably not going to change. It could, I suppose. But yeah. We talk a lot about also, I'm trying to think how to present this in terms of maps, we talk a lot about linked data. I'm sure you're familiar with that. In the library world, library at world. We do some stuff with link data. You know, a lot of our some of there's some catalogers that are really working on presenting kind of developing link data views into our catalog, we do a little bit. One thing that we do it really show you Oh, I can a little bit. So in our catalog, we have a collection of our kind of sub collection of coins, and metals.

Speaker 27:06 Called numismatics. That's not what we want. We want to look at our clients.
Speaker 27:13
All right. So we have a whole collection of coins that they're continually photographing and cataloging down in the Special Collections area. And they're pretty cool. I mean, there's really interesting stuff there. Let's see publication year. That's kind of a funny way to put it.

Speaker 27:34
When it was published,

Speaker 27:36 yeah, it's minted is really the. Anyway, that's not really working very well, is it? We have some, like, we have a lot of like old, like, really old Roman coins and stuff. Yeah, like this, right? There you go. So you can kind of see the coin and the photos taken on the front and back. And this data is collected in a way that is a kind of a standard. And there's a group called there's a standard called Mizmor. Just the right one, nope. This one. Right. So it's a schema, we'll call it a linked data schema for, for sharing for cataloguing and sharing information about clients and numismatic objects. So basically, the idea is if you present your data, right, in these, with this kind of like, we call it a vocabulary, right? And then you could send, send out the data to some other group that could like put it in their database, right and share it that way. And they there's a, there's a link data schema for doing that. Right, it's called nuts, which is terrible. But anyway. So you know, this uses, like, all your, I don't know if how much you know about link data, but all of RDF and you know, you can represent data that's very, like, controlled and interoperable manner, you know, and we do provide so we do have someone run the script occasionally that pulls all the coin data out of our system and like, generates all these like XML documents that we send off to some, I don't know, some institution somewhere and I don't know what they do with it. This is a very long projects, and I've kind of gotten that out of my head. I mean, there's some really neat, I can never remember how to get there, but there's some really neat like Search portals for numismatic data, you know, this cool because you can like look at maps of the ancient world and see, you know, where coins were minted and where they were discovered. That was kind of thing. It's like they're minted over here and Syria, and then they were discovered in a hoard over in, I don't know, Germany somewhere, right. Like it was It is really interesting. And I think the links, the linked nature of that data makes makes it really good for that kind of research, archaeological research and stuff. So we participate in that a little bit, you know, but I'm sorry, it's certainly important. Yeah.

PART 2

Speaker 0:00 Have a whole collection of coins that they're continually photographing and cataloging down in the Special Collections area. And they're pretty cool. I mean, there's really interesting stuff there. Let's see publication year. That's kind of a funny way to put it.

Speaker 0:18
When it was published,
Speaker 0:20
yeah, it's minted is really the. Anyway, that's not really working very well, is it? We have some, like, we have a lot of like old, like, really old Roman coins and stuff. Yeah, like this, right? There you go. So you can kind of see the coin and the photos taken on the front and back. And this data is collected in a way that is a kind of a standard. And there's a group called there's a standard called Mizmor. This is the right one, nope. This one. Right. So it's a schema will call it a linked data schema for, for sharing for cataloguing and sharing information about clients and numismatic objects. So basically, the idea is if you present your data, right, in these, this kind of like, we call it vocabulary, right. And then you could send, send out the data to some other group that could like put it in their database, right and share it that way. And they there's a, there's a link data schema for doing that. Right, it's called nuts, which is terrible. But anyway. So you know, this uses, like, all your, I don't know, if how much you know about linked data, but all RDF, and, you know, you can represent data that's very, like, controlled and interoperable manner, you know, and we do provide, so we do have, someone runs a script occasionally, that pulls all the point data out of our system, and like, generates all these, like XML documents that we send off to some, I don't know, some institution somewhere, and I don't know what they do with it. This is a very long project. And I've kind of like, you know, gotten that out of my head. I mean, there's some really neat, I can never remember how to get there. But there's some really neat, like, search portals for numismatic data. You know, that's cool, because you can like look at maps of the ancient world and see, you know, where coins were minted and where they were discovered. That was kind of thing. It's like they're minted over here and Syria. And then they were discovered in a hoard over in, I don't know, Germany somewhere, right. Like it was It is really interesting. And I think the links, the links, nature of that data, makes makes it really good for that kind of research, archaeological research and stuff. So we participate in that a little bit, you know, but I'm sorry, it's certainly important. Yeah.

Speaker 3:25
Awesome. So I guess, um, one last question that I have. So do you know, the spatial, I guess, search work, sort of like, a bit of like, what's happening behind the scenes?

Speaker 3:37
Oh, absolutely. Yeah. Let's take a look here. Just second. Let me see if I can find that. Oh, that's not a test one. This testing. We did a record here called Elliot something rather. Okay. So this is the page as it looks in our digital repository. And you'll notice like the first thing I did was I assigned it a bounding box. So it's basically just a polygon that says, you know, this data is in this generally in this location, right? So anything that goes into our portal has to have that bounding box because it's really crucial to the search. So if we look so this is how it looks in the portal, and then we can get a view of the JSON. So there's a each each record has a JSON document, right? In a particular schema. And you'll see somewhere in here, we see two things to where is the most important one Here we go. There's this field, or property and it just document called solar G arm. And it's in a particular format. But this basically tells you what the bounding boxes. So each one of these is like, north, south, east, west, right? On a map. So these are latitude longitude. And then as our search engine, we use solar, I don't know, have you done any work with solar before? No. Okay, so solar is an open source search engine. It's really widely used. Even like commercial websites use it for things you know, you like go to Lowe's dot com and you search for hammers, good chance there's the data has been indexed in something like solar, there's one called Elastic Search. There's, there's some other ones but you know, there's a good chance of it because there it's really built for, for indexing data for very fast searching and recall. So you can search by properties, or keywords or titles, it's very flexible. And solar has a geospatial searching function. Right. So we can do all kinds of different search queries based on bounding box is what we were talking about. But we can say, Hey, show me everything, that's fine. Here's a point, we can tell a solar, I am at this, look at this location, latitude, longitude, squarey, everything in your index within five kilometers, and return that return those documents to me. So everything gets indexed as a document. So we have a record. And we transform it into this little JSON, and then we tell solar to index it. So it's kind of like a database, right? But we don't really use it. Like, it's not a permanent database. But it's a very fast data index of data that we want to search, right? Let's see. So there's all kinds of cool stuff you can do. It's getting every time they release a new version of solar, you know, there's more stuff you can do with it, right, like geospatially. So when we index the data in, and then we do a query, so where did it go, it's gonna open up a new tab. Staging. So if I do a query like this, I'm going to draw a little, I'm going to zoom in on the map here. I'm going to zoom in on the map, there we go. And you'll see up here, the JavaScript add to this little parameter called V box. And then on the back end, on our server, it knows, okay, the use, the user wants to search solar within this bounding box, and it will generate a query that we send to solar with these coordinates. And then solar sends back documents that are within that search area,

Speaker 8:31
right. So every time you make updates to the map, so like, for example, we made like the new Elliot test map, then at some point that's going to be sent to solar, so it can index it as well.

Speaker 8:47
Exactly. Yeah. So over here in our digital repository, where I uploaded the data and assigned it a title and the bounding box, as soon as I hit this action called complete, right, and I said, Submit. On the back end, it's taking that data and that metadata and making this little document, it makes this document here, right, it generates it. And then it sends it to our solar instance. And then as soon as it's in solar, it's searchable in this in this catalog, right? Yep. Yeah, solar is great. I know, you know, kind of the hip want to use now is called Elastic Search, and we'd love to use it, but everything that we use is like based on solar right now. So it's kind of like, you know, you go with what you know. And they're all based on a project called leucine. Excuse me, which is also an Apache project, but sort of a lower level. Library for doing for doing doing search? Basically?

Speaker 10:06
Interesting. Okay, I didn't know that. Do you guys use solar for like your catalog searches as well? Okay, I guess you don't need it, I guess. So

Speaker 10:17
yeah, we do. So the, our main library catalog is, has a solar back end. And also the map portal does as well. So when you do a search in the catalog, you know, New Orleans, it's actually creating a solar query, and communicating with the server and then returning documents that get rendered like this.

Speaker 10:41
That's awesome. That's, that's all I have questions for now. But I learned so much I didn't know this was how, like, I guess, organization, like organizing it and like, use it writing queries. This kind of, like, mysterious.

Speaker 10:55
Yeah, can't be kind of mysterious. Absolutely. And I've been working for quite a while and I'm sure I mean, there's, you know, like anything, if you really dive into you can really get deep, you know, I get more or less make solo work for me, but you know, I don't know, everything about how it all works, you know, but it's a pretty fascinating topic. You know, I think leucine and solar really started to get developed as commercial, like, search engines became a thing, you know, like Google and the ones before like, maybe for your time, but you know, also this the kind of early ones people like, oh, yeah, maybe we can make search engines for us, you know, so.

Speaker 11:43
Awesome. Thank you so much for letting me interview you and your experiences.

Speaker 11:52
You're welcome. And if you have any more questions, I'm I'm definitely definitely open. Okay, I'm gonna stop