12 Collections of Structured Data
The set of messages matching a tag.
The list of messages in a conversation.
The set of friends of a user.
Do Now!
How are collective data different from structured data?
In structured data, we have a fixed number of possibly different kinds of values. In collective data, we have a variable number the same kind of value. For instance, we don’t say up front how many songs must be in a playlist or how many pages a user can have; but every one of them must be a song or a page. (A page may, of course, may be conditionally defined, but ultimately everything in the collection is still a page.)
Observe that we’ve mentioned both sets and lists above. The difference between a set and a list is that a set has no order, but a list has an order. This distinction is not vital now but we will return to it later [Sets as Collective Data].
A family tree of people.
The filesystem on your computer.
A seating chart at a party.
A social network of pages.
We have already seen tables [Introduction to Tabular Data], which are a form of collective, structured data. Now we will look at a few more, and how to program them.
12.1 Lists as Collective Data
song-list = [list: lver, so, wnkkhs]
check: song-list.length() is 3 song-list.first is lver end
Thus, what we have seen earlier about building functions over lists
[Processing Lists] applies here too. To illustrate, suppose
we wish to write the function oldest-song-age, which consumes a
list of songs and produces the oldest song in the list. (There may be
more than one song from the same year; the age—
oldest-song([list: lver, so, wnkkhs]) is lvar oldest-song([list: so, wnkkhs]) is wnkkhs oldest-song([list: wnkkhs]) is wnkkhs oldest-song([list: ]) is ???
What do we write in the last case? Recall that we saw this problem earlier [my-max: Examples]: there is no answer in the empty case. In fact, the computation here is remarkably similar to that of my-max, because it is essentially the same computation, just asking for the minimum year (which would make the song the oldest).
From our examples, we can see a solution structure echoing that of my-max. For the empty list, we signal an error. Otherwise, we compute the oldest song in the rest of the list, and compare its year against that of the first. Whichever has the older year is the answer.
fun oldest-song(sl :: List<ITunesSong>) -> ITunesSong: cases (List) sl: | empty => raise("not defined for empty song lists") | link(f, r) => cases (List) r: | empty => f | else => osr = oldest-song(r) if osr.year < f.year: osr else: f end end end end
Note that there is no guarantee there will be only oldest song, and this is reflected in the possibility that osr.year may equal f.year. However, our problem statement allowed us to pick just one such song, which is what we’ve done.
Do Now!
Modify the solution above to oldest-song-age, which computes the age of the oldest song(s).
fun oldest-song-age(sl :: List<ITunesSong>) -> Number: os = oldest-song(sl) song-age(os) where: oldest-song-age(song-list) is 71 end
12.2 Sets as Collective Data
Your Web browser records which Web pages you’ve visited, and some Web sites use this information to color visited links differently than ones you haven’t seen. This color is typically independent of how many times you have visited the page.
During an election, a poll agent might record that you have voted, but does not need to record how many times you have voted, and does not care about the order in which people vote.
song-set = [set: lver, so, wnkkhs]
Do Now!
How can we tell whether Pyret cares about the order?
check: song-set2 = [set: so, wnkkhs, lver] song-set is song-set2 end
Exercise
How many different orders are there?
check: song-set3 = [set: lver, so, wnkkhs, so, so, lver, so] song-set is song-set3 song-set3.size() is 3 end
12.2.1 Picking Elements from Sets
This lack of an ordering, however, poses a problem. With lists, it was meaningful to talk about the “first” and corresponding “rest”. By definition, with sets there is not “first” element. In fact, Pyret does not even offer fields similar to first and rest. In its place is something a little more accurate but complex.
The .pick method returns a random element of a set. It produces a value of type Pick (which we get from the pick library). When we pick an element, there are two possibilities. One is that the set is empty (analogous to a list being empty), which gives us a pick-none value. The other option is called pick-some, which gives us an actual member of the set.
fun an-elt(s :: Set): cases (Pick) s.pick(): | pick-none => error("empty set") | pick-some(e, r) => e end end
Do Now!
Can you guess why we didn’t write examples for an-elt?
Do Now!
Run an-elt(song-set). What element do you get?
Run it again. Run it five more times.
Do you get the same element every time?
Do Now!
Given that an-elt does not return a predictable element, what if any tests can we write for it?
12.2.2 Computing with Sets
Once we have picked an element from a set, it’s often useful to obtain the set consisting of the remaining elements. We have already seen that choosing the first field of a pick-some is similar to taking the “first” of a set. We therefore want a way to get the “rest” of the set. However, we want the rest to what we obtain after excluding this particular “first”. That’s what the second field of a pick-some is: what’s left of the set.
fun my-set-size(shadow s :: Set) -> Number: cases (Pick) s.pick(): | pick-none => 0 | pick-some(e, r) => 1 + my-set-size(r) end end
12.3 Combining Structured and Collective Data
As the above examples illustrate, a program’s data organization will often involve multiple kinds of compound data, often deeply intertwined. Let’s first think of these in pairs.
Exercise
Come up with examples that combine:
structured and conditional data,
structured and collective data, and
conditional and collective data.
You’ve actually seen examples of each of these above. Identify them.
Finally, we might even have all three at once. For instance, a filesystem is usually a list (collective) of files and folders (conditional) where each file has several attributes (structured). Similarly, a social network has a set of pages (collective) where each page is for a person, organization, or other thing (conditional), and each page has several attributes (structured). Therefore, as you can see, combinations of these arise naturally in all kinds of applications that we deal with on a daily basis.
Exercise
Take three of your favorite Web sites or apps. Identify the kinds of data they present. Classify these as structured, conditional, and collective. How do they combine these data?