The possibilities are endless for TCU researchers crunching numbers to uncover connections.
Big data is becoming, well, a big deal at TCU. Large-scale number crunching has spread across the campus in directions few people could have imagined a decade ago.
From tracking the movement of galaxies to reworking Shakespeare’s sonnets, faculty and students are launching investigative pursuits using personal computers connected to supercomputers or cloud-based databases.
The AddRan College of Liberal Arts offers a minor in digital analytics. When the program launched in fall 2017, “we were hoping for 20 students, and we passed that number very quickly,” said Curt Rode, a senior instructor in TCU’s department of English who helped create the program. “There’s been tremendous interest from students with little [official] advertising. … We now have over 70 students.”
The liberal arts college undertook a cluster hire to bring more big data experts to campus. Rode, who is also associate director of the Center for Digital Expression, said that not many private universities of TCU’s size have more hands-on engagement with big data.
Big data applications for business and science students seem logical. But how so liberal arts?
“The emphasis on digital culture, digital technology, data analytics and digital humanities — and all other things big data — will further enhance our liberal arts majors’ marketability and preparation for a global economy,” said Sonja Watson, dean of AddRan.
Watson, who took the position in May 2020, has embraced the big data initiative launched by her predecessor, Andrew Schoolmaster. Big data already had been applied at AddRan in such departments as political science, criminal justice, geography, economics, history and English, Schoolmaster said. So it made sense to foster more collaboration and more interdisciplinary synergy across the university.
Just what is big data? According to the Brookings Institution, they are massive datasets used by governments, major corporations, consulting firms and academic researchers to conduct analyses, unearth pat- terns and drive decision-making. Computer- enabled access to astronomical amounts of information has paid off, Brookings reported, with “a wide range of benefits, such as informing public health research, reducing traffic and identifying systemic discrimination in loan applications.”
Among the three professors in AddRan’s initial cluster hire was geographer Xiaolu Zhou, who used big data to analyze Chicago’s bike-sharing program next to travel patterns of the city’s residents. “Users check out the bike at one station and return it at another,” Zhou said. “If not properly balanced, the system is not optimized.” In a paper published in a 2019 Journal of Transport Geography, he also determined what factors influenced people to choose a bike-share over hailing a taxi.
Zhou, an assistant professor of geography, focuses his research on strategies for sustainable urban development. In another 2019 study, this one published in the ISPRS International Journal of Geo- Information, he employed big data to analyze Atlanta’s rental housing market.
He said he mined Craigslist ads and factored for location, amenities and apartment description, then processed the data to craft a model that predicted rental prices for particular properties. “The overall goal was to predict the price based solely on textual description.”
As in all cities, some rental listings were misleading or downright false. But with big data, Zhou said, “Data will speak for itself. The fake data will be outliers, teased out.”
The Chinese-born, Singapore- and U.S.- educated professor also harnessed big data to create a digitally animated map that showed where Covid-19 cases rose and fell across the United States during the early months of the pandemic, making the virus’s spread understandable at a glance.
Working with Johns Hopkins data, which required cleaning (fixing or deleting corrupt or inaccurate data) and reformatting, he envisioned a more user-friendly approach. “Why not create a dashboard to show where are the hot spots?” he asked. “What is the trend?”
A SEX-OBJECT SPREADSHEET
For a capstone project required for his minor in digital culture and data analytics in 2020, Paul DeHondt, then a senior, showed how big data could be used to examine popular culture.
DeHondt wanted to see if Hollywood’s depiction of women as sex objects had changed over time. To answer the question, he analyzed and compared the top-10 grossing films of the golden age of cinema, 1930-1949, with the top 10 hits released between 2000 and 2020.
DeHondt used big data to determine the top films of the respective eras. On BoxOfficeMojo.com, an industry tracking website, he found the dataset of top lifetime grosses and pasted it into a spreadsheet.
He then used a popular software tool called Python to clean and analyze the box-office gross profits. After eliminating other time periods, he split the data between the two periods on which he was focusing. He then reduced the datasets to include only the top 10 grossing films of the two eras.
He then delineated examples of the “male gaze,” a term coined by British filmmaker and scholar Laura Mulvey to describe the sexualized way of observing women that provides men with voyeuristic pleasure.
Next came the time-consuming part.
Notebook in hand, DeHondt watched all 20 films — some more than three hours long — marking down instances of what he deemed the male gaze. Results were not totally unexpected. The 1930-49 era had far more instances: 51 compared with 24.
Despite improvement, “Hollywood still objectifies women today in order to satisfy male gaze,” DeHondt’s study concluded.
DeHondt, now a producer for a virtual reality company in Dallas, said his research topic — what he considers anachronistic, sexist filmmaking — veers into the subjective. Even so, his intention was to help audiences become “better film viewers” who more readily recognize poor representations of women.
WAS JANE EYRE A MAN?
Gabi Kirilloff, another big data expert at AddRan, uses massive datasets to analyze 19th- and 20th-century literature.
In a project that ran from 2014 to 2018 and was published in the Journal of Cultural Analytics, Kirilloff, assistant professor of literature and humanities, used big data in a novel way. “I was kind of curious if a computer could guess if a character was male or female based on what the character was doing,” she said.
Looking at 3,329 novels written between 1800 and 1900, she trained a computer model to select the gender based only on verbs a character performed. “A computer could correctly guess about 81 percent of the time,” she said. “Only for six novels did the computer get them backward — one of which was Jane Eyre. The data analysis is a starting point. It raises more questions than it answers. In some cases, the author was creating an unorthodox, spunky woman, as in Eyre.”
In her dissertation research, some of which is under review for publication in the journal College Literature, Kirilloff used a computer to differentiate narration from dialogue — a difficult task because some Victorian novelists were not scrupulous about the use of quotation marks. “Most books in the 19th century are just a hot mess,” she said.
Kirilloff’s solution was to create a script that accounted for numerous scenarios. “It puts quotation marks where they should be,” she said. “It’s not perfect, but I did a small test on 30 novels, and it was 80 percent accurate.”
The digital humanities specialist also applied her data-informed approach to identify a literary device variously called authorial intrusion, reader address or “dear reader” — akin to an actor’s aside to the audience.
“African American authors often use this device to question white readers’ ability to fully empathize with Black characters,” Kirilloff said, citing Harriet Jacobs writing in Incidents in the Life of a Slave Girl, “O reader, can you imagine my joy? No, you cannot, unless you have been a slave mother.”
Reader address was “very popular in the 19th century but never went away,” Kirilloff said. “Modernists don’t totally stop using it, although Victorians were more likely to call the reader ‘reader,’ while modernists would call the reader ‘you.’ ”
PAYING CASH FOR THE SALAD BAR
In the Neeley School of Business, Sarang Sunder used big data to solve a seemingly intractable problem American businesses face: missing or partially observable data on customers.
Creating a profile of customers based on purchase history is easy when they hand over credit cards or use a loyalty card. “But the system breaks down when customers use cash,” said Sunder, assistant professor of marketing.
And then there are those who pay with credit cards sometimes, greenbacks other times, he said. “The company knows that $5 was spent; it just doesn’t know who paid for it.”
Cash transactions can account for as much as 40 percent of total sales, especially in gas stations and restaurants. Ignoring the cash customer when analyzing data, he said, could lead a company to make suboptimal marketing decisions.
For two years, Sunder and his research collaborator Yi Zhao, a marketing professor at Georgia State University, have been trying to decipher customer behavior when different payment instruments are used to predict behavior for all patrons. “How do you construct a holistic view? How do we characterize customers even though some behavior — say, cash transactions in this case — is invisible to the firm?” Sunder said. “The issue becomes even more complex when there are so many assortments and choice combinations that the customer has.”
For instance, which of the numerous combinations of salad bar items would a patron select when there are so many choices? Developing meaningful customer insights from data becomes especially difficult when the customer information may not be fully observable or when customer choices may include a large assortment of items.
In a recent working paper, Sunder and Zhao took sales data from a national fast- food chain and applied a modeling approach to figure the probabilities that particular customers would use cash. How? They developed a methodology to address the problem of missing data in a way that was sufficiently scalable to handle big data.
Once a customer’s information is matched with potential cash transactions, the company can then apply standard statistical techniques to analyze shopper behavior using a more holistic view.
In this way, the company has a better picture of who the customer really is, Sunder said, which allows the business to build robust marketing strategies and develop deep insights from data.
WRITTEN IN THE STARS
Peter Frinchaboy’s view stretches far beyond suburban fast-food franchises.
The astronomer uses big data to study the chemical makeup and other characteristics of stars and galaxies deep in space. TCU is an academic collaborator in the Sloan Digital Sky Survey, which Frinchaboy joined in 2007.
“One of the biggest surprises is that stars don’t stay where they are born in the Milky Way,” said Frinchaboy, associate professor of astronomy. “They migrate.”
Astronomers have developed theories that such celestial bodies might shift around, said Frinchaboy, overall survey coordinator for SDSS-IV, which is the fourth generation of the survey, operating from 2014 to 2020. “But we were able to show through chemistry that the migration [is] happening. It had been debated. Now the debate is over.”
The survey takes light from a star and puts it through a prism. From that, he said, “we can see all of the different absorption lines of different chemicals — the chemical fingerprint.”
To test the migration theory, his team of sky survey astronomers collected millions of data points, including location in the galaxy, speeds of stars and chemical measurements of tens of elements per star.
“I remember we were happy to have data for a couple of hundred of stars,” said Frinchaboy, who has published his research in the Astronomical Journal. “And now we’re working with [over] half a million.”
At Neeley, Minakshi Trivedi and Sarang Sunder used big data to analyze the effectiveness of a South Korean policy that banned teenagers from playing video games at late hours. The ban was prompted by an addiction crisis that resulted in neglect of school activities, dropping grades and increasing isolation and depression, leading in severe cases to suicide.
But how effective was this effort? Trivedi, the J. Vaughn and Evelyne H. Wilson Professor of Marketing and chair of the Neeley Analytics Initiative, and co-author Sunder analyzed data from an online fantasy baseball game and found that casual players did indeed play less. As the researchers explain in a 2020 Marketing Science study, heavy gamers — the top 10 percent — found ways to work around the curfew by cutting into school and family time and in fact exacerbated their gaming-related problems.
In the long run, while the ban did prevent light players from becoming game junkies, Trivedi said, “it was of little value to the already heavy gamers. … A more nuanced approach than a simple ban on all online teenage gamers would be required to find a solution to what is a growing global problem.”
Their findings may inform the design of other regulatory and policy solutions aimed at curbing the negative impact of social excesses, such as sugar taxes to curb obesity, heavier excise taxes to prevent smoking and bans on plastic bags to encourage eco-friendly behavior.
More importantly, big data could play a critical role in proving whether these attempts work and, if so, why.
And these TCU students and researchers from an array of academic disciplines have shown that the possibilities for big data are as endless as the numbers themselves.
WATCH A VIDEO What is Big Data?
BY BARRY SHLACHTER
PHOTOS BY RODGER MALLISON