Language Fun: Are You a Super Scientist?

We’ve got one last Language Fun game for you to try!

Answer a couple of quick personality questions and we’ll tell you if you’ve got what it takes to be a scientist yourself. Are you a Super Scientist, or maybe just a Sort-of-Scientist?


Take our quiz and find out: click here to play!

How did it go? What ranking did you get? Tell us in the comments!

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Week 10: Final Results

Welcome to our final post for the summer!

This week is going to be a bit different from the rest as we wrap things up, and instead of a written post, we’ve got a video for you that gives a few updates and summarizes what we accomplished together.

You can watch that video above! Then, come back here to read the rest of what we’ve got.


We have two final tasks for you, if you want to help out!

First, we’d like to hear your thoughts about our project. Click here to take a quick survey where you can tell us what you liked, what you didn’t like, and how we can make this the best experience possible next time.

Second, we do plan on running similar Citizen Science programs in the future! If you want to join us again, you can click here to leave your email address so we can keep you in the loop. (You’ll also have a chance to give us your email in the feedback survey above; if you’ve done that already, no need to fill this one out too!)


That’s a wrap on this summer’s work. Thank you so much for reading and contributing! We’ll see you next time.

  • The BLNDIY Team


**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Language Fun: Dialects

Who do you sound like?

There’s all kinds of variation in the way people talk, which can be influenced by all kinds of things, too. Our personalities, identities, and origins all have a part to play in our unique versions of our languages. How do different people speak English in the United States, and can we decide where people are from based on their dialect?


Can we guess where you’re from? Or if you don’t live in the U.S., where would you live based on the words you use? Click here to find out!

Did we get it right? What do you think about the different words we might use? Tell us about it in the comments!

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Week 9: Analyze Data

The results are in!

First, as always, feel free to sign up for our leaderboard or more if you’d like to help us out even more. If you’re just joining us, we’ve got the whole summer’s work archived for you to look through and get up to speed, or jump in now anyway.

This week, we’ll be doing some statistical analysis of the data to know which effects are real (or “statistically significant”) and which we need more evidence on.


So, let’s get started!

When researchers conduct statistical analyses, they are trying to draw objective conclusions based solely on the data. Last week, we explored the data visually and looked for any patterns we might see. Humans naturally look for patterns in everything, though. That’s why people see clouds that look like everyday objects or find faces in burnt pieces of toast. By conducting statistical analysis, we can decide which of the patterns we saw last week have enough evidence for us to argue that they actually exist!

When a scientist calculates the stats for their data, they are choosing a set of “tests” to run, each of which is designed to look at a specific kind of difference. The exact type of analysis you use depends on the type of data you’re looking at. Whatever the type, statistical tests give at least two values: the test statistic and the p-value. Because the test statistic is specific to the test and hard to understand alone, the second value, the p-value, gives the probability that the difference or effect is not present in ‘true’ values. In other words, the p-value tells us whether we should or should not find a similar result if we repeated the experiment.

If the p-value is less than 0.05, we say that the test is statistically significant and the evidence supports the effect or difference being ‘real’ and likely to be found again if we repeated the experiment.


We’ve got a lot of good info for you this week, because we want to look at a lot of potential effects and tell you how we reached our conclusions! So, we’re going to divide things up a bit and let you jump around to different parts of the post as you see fit. Each section will start with an explanation on the type of test we used on the data and then give our results for the thing we were looking at, so feel free to only read the bits you want to. Don’t miss the final results for the guesses you made in Week 7, too! Those are in with their relevant categories.

Internal vs. External Speech, & Men vs. Women: T-Tests

Age: Linear Regression

List Category: ANOVA

Personal Strategies: Fisher’s Exact Test

Final Summary



Test of Difference of Means: T-Test

T-tests are for comparing the mean of two groups. A key idea behind t-tests is that the mean value of a group in the data, say the average number of words remembered by men in our experiment (12.81), is probably not the exact value we would get if we tested the entire group, i.e. all men. However, this ‘true’ value we would get if we tested all men is likely close to 12.81. To counter this, the t-test calculates a range that should contain the ‘true’ value that is centered around the observed value (12.81). The t-test compares the amount of overlap between the ranges of possible ‘true’ values for the different groups of participants, and based on the amount of overlap, the t-test calculates a t-statistic and the related p-value. If the p-value is less than 0.05, there is a less than 5% chance that there is no difference between the two groups. In other words, there is a greater than 95% chance that there is a difference between the ‘true’ values for the two groups.

Statistics are reported in papers in different formats depending on the field. Our lab and field use the American Psychological Association (APA) format. In this format, statistics are presented in line in the following format: (t-value, p-value).

First, let’s see if there is a difference in performance between our two conditions.

We used a t-test to examine if there was a significant difference in the mean number of words remembered by participants told to repeat the words out loud and by participants told to repeat the words in their head (our main research question!). The test was not significant (t=0.72, p=0.48), indicating that there was not a significant difference in the ability of the two groups. Here’s what you all predicted in Week 7. Looks like half of you were right!

So, interestingly (and maybe unfortunately), we didn’t see a clear difference in performance between the two conditions. However, keep reading to see how that isn’t the full story!

Next, let’s see if there is a difference in performance between men and women.

We used a t-test again to examine if there was a significant difference in the mean number of words remembered by participants who identify as men and participants who identify as women. The test was not significant (t=0.71, p=0.48), indicating that there was not a significant difference in ability based on gender. It should be noted that one participant reported “Other” as their gender and one participant that preferred not to state their gender. While both of these participants scored above average, multiple participants are necessary to draw group conclusions.. Again, here’s what you thought was going to happen. You predicted a slight lean towards women performing better in the study, but were still pretty close overall!

These results indicate that there was no clear difference in performance based on gender.

Click here to return to the top of the page!



Test of Relationship: Correlation/Linear Regression

Instead of looking at differences between groups, some tests examine if there is a relation between two variables. For example, the analysis below looks at the relationship between the number of remembered words and the participant’s age. A correlation examines how one value increases or decreases as the second value increases or decreases. Maybe you’ve heard of ‘causation vs. correlation’ before? That’s what we’re talking about here; a correlation is just an observed relationship between two sets of values, not necessarily a statement on how one causes the other to happen! A correlation between the number of words remembered and age is asking “as someone’s age increases, does the number of words they remember increase?” A correlation produces an r-value (similar to a t-test producing a t-value) which gives the strength of the relation, where a higher value indicates a stronger relation. The range of r-values runs from -1 to +1, where -1 indicates that as one variable increases, the other decreases and +1 indicates that as one variable increases, the other also increases. For example, as a tree ages, it grows taller. A correlation examining the relation between a tree’s age and its height would have a high positive r-value because as one variable increases (age of the tree), the second variable (height of the tree) also almost always increases.

To determine a p-value for a correlation, we can use a technique known as Linear Regression. This technique tries to create a straight line that comes as close to the actual data as possible. To help understand what that means, check out the plot below to see where that line falls compared to the other data. To determine the p-value, we can examine the difference between the predicted line and the actual data.

So, how much would a participant’s age relate to their performance in the study?

A correlation comparing a participant’s age and the mean number of words they remembered was statistically significant (r=0.40, p=0.02), suggesting that there was a moderately strong relation between age and performance! Linear regression suggested that the number of words a participant could remember increased by 0.18 per each year older. Here are the average predictions you made about age for this study. The lower the score on the graph, the higher the rating you gave them predicting they would do better in the study. Turns out you were wrong on this one! You gave the youngest participations the lowest score (putting them towards number 1 most often in your ranking), but older people actually did better!

As you can see in the plot below, while there is not a clear pattern where older participants almost always score well and younger participants almost always score poorly, there is a clear trend where higher scores tend to fall in the higher age range.

Click here to return to the top of the page!



More Complex Tests: Analysis of Variance (ANOVA)

The tests we discussed before, correlation and t-test, are like the hammer and screwdriver of the scientist’s toolbox. It’s hard to complete a project without using at least one of them. If a t-test is the screwdriver of the toolbox, the next test we’ll discuss (ANalaysis Of VAriance: ANOVA) is the electric drill. While a t-test is limited to only two groups, ANOVAs allow for comparison between many different groups and different types of groups. The logic is the same as the t-test though. Take the mean of the group and based on the difference from one participant to the next, create a range of possible values that contains the value we might get if we tested every single person possible. Then, compare the ranges for each group to decide if the groups are actually different from one another.

We have two ANOVAs to look at, which can also help explain how ANOVAs are used. First, we want to compare the mean number of words remembered for each of our list categories (animals, objects, and fruits/vegetables) to see if people did better on some lists compared to others. We have three groups though, so we can’t test all of them at once using a t-test. However, ANOVA can give us a test statistic (F-value) and p-value based on whether there is a difference between any of the three groups.

Our second ANOVA adds a second layer to the question and shows the real strength of ANOVA. We want to compare if there is a difference in the mean number of words remembered based on the list category AND whether a participant was told to repeat the words in their head or out loud. We’re comparing the three groups we looked at in the first ANOVA, which compared participants within-subjects. It compared a participant against themselves, i.e. how many words they remembered for each category. In our second ANOVA, though, we are also comparing between-subjects, by splitting participants based on whether they were in the “in your head” or “out loud” conditions. The ANOVA produces a F-value and p-value telling us whether there was a difference in the difference between groups.

That might sound a bit confusing, so let’s actually use ANOVA for these questions and see how it works in action.

First, was there a difference based on list category?

An ANOVA examining the effect of list category was statistically significant (F=17.15, p=0.0002), indicating that participants remembered more words for some lists compared to others. We can use t-tests to examine which specific groups were different. There was not a statistically significant difference between the number of animal words remembered and number of fruit/vegetable words remembered (t=0.03, p=0.97). There was a significant difference between the number of objects remembered and the number of fruits/vegetables remembered (t=2.80, p=0.007) and between the number of objects remembered and the number of animals remembered (t=2.68, p=0.009). The results indicate that household objects were significantly harder to remember than animals or fruits/vegetables.

Animals Objects Fruits/Vegetables
14.64 11.72 14.60

Second, did it matter whether participants repeated the words out loud or in their heads?

An ANOVA examining the interaction of list category and task condition (out loud/in your head) was not significant (F=2.16, p=0.15). While the test approached significance (relatively low p-value), there was not enough evidence to conclude that there was a difference.

Condition Animals Objects Fruits/Vegetables
In Your Head 11.41 15.88 15.29
Out Loud 12.06 13.61 13.94

Click here to return to the top of the page!



Last one! Fisher’s Exact Test for Count Data

We have one more analysis to look at, but it’s pretty straightforward. We want to know if whether the participants were told to repeat the words in their heads or out loud impacted whether or not they used a strategy. For this we can use a Fisher’s Exact Test, which will examine the ratio of Yes responses to No responses to the question “Did you use a strategy to help you remember?” for the two conditions. The test was not significant (p=0.16), but there seemed to be a clear trend in the data. We might need to run a follow-up study to explore this relationship further.

Choice In Your Head Out Loud
Yes 14 9
No 4 9

Click here to return to the top of the page!



So, what does it all mean?

Now that we’ve done the analyses and have some objective measures of what happened, it’s time to draw some final conclusions. Were the results what you expected? We had two results with p-values around .15, which isn’t low enough for us to conclude they’re real but is low enough that it might be worth exploring more.

In the end, it looks like age was a statistically significant factor, as was the category of objects that participants had to remember. But, neither gender nor task condition were significant (though the latter interaction approached significance). Strategy choice had a clear pattern but was not statistically significant, and this should be investigated further in a following study to see what we can make of it!

If you’ve read through all of this, well done! If this is your first time looking at data and stats, it may have been a little overwhelming. Don’t worry though! Science is a skill that takes time and practice, and even just learning a bit about how it all works is a big accomplishment.

Remember that, even though using statistical tests like these gives us a far more objective look at our data, this is only the beginning of an even larger process. We were only testing for our pretty specific research question this time around, but by trying to answer that question we’ve run into other cool things we might want to learn more about too! What kinds of studies might be good ways to continue the work we’ve done here so far?

Click here to return to the top of the page!


In the comments section below, tell us about what you think the big takeaway message is from our results! What did we learn about internal language in our study?

Next week, we’ll be making some final conclusions about the study and looking back on everything we did this summer!

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Language Fun: The Stroop Effect

Want to trick your brain?

You probably don’t think about the fact that you’re reading whenever you see text, but somehow you can still remember what you saw after driving by a billboard on the highway or after you glance at a sign somewhere. How does reading work like that? This week, we’ve got a quick version of the Language Pod’s most popular demo, which shows off the Stroop Effect.


Give the game a shot and see you how do! Click here to give it a try.

How did you do? Tell us how it went in the comments! Have your family and friends try it, too, and see who can do the task the easiest/fastest. What might that mean about reading and our brains?

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Week 8: Make Observations

We’ve got data!

First, as we always mention, feel free to sign up for our leaderboard or more if you’d like to help us out or get some bragging rights for participating. But just voting or commenting on our posts is more than enough, and you can do that without signing up! If you’re just joining us, we’ve got the whole summer’s work archived for you to look through and get up to speed, or jump in now anyway.

This week, we can share the raw data from your experiment for you to explore! You’ll get the chance to make some first observations, and be sure to leave a comment with anything cool you find.


So, what did we collect?

After less than a day, we had 40 people complete your study, 20 who were told to remember the words by repeating them out loud and 20 who were told to repeat the words in their heads. Hopefully this gives us some insight into how inner speech works!

CLICK HERE if you want to see everything we collected in a handy spreadsheet; then you can come back here to learn more and add your thoughts.


We’ve got a couple of graphs to get you started!

First, here’s our main effect. How well did participants seem to remember the pictures in each condition?


What about the picture categories? Did participants remember one set better than others?


Next, what if we separate the results by gender?


Finally, how much would a participant’s age relate to their performance in the study?


Of course, we have so much more we can look at! If you want to take it further, here’s all of our data in a spreadsheet again. Everything we collected is listed there, and you can find things like what strategies the participants used, whether or not they say they really followed their condition’s requirements, basic demographic info, and more!

When looking at the graphs or the spreadsheet, try to think about how they connect to our original research question! In addition to whatever neat patterns you might find, what do these data say about internal language?


Now, let’s talk a little bit about why we want to look at our data like this.

Scientists often start by making observations about the general pattern of the data through visual representations. Rather than doing too much number crunching, it’s useful to get a more general idea of what might be happening.

We have to be careful, though, that making observations in this way doesn’t lead us to conclusions we shouldn’t reach. At what point should we be convinced that an effect we think we can see is real and not just our minds making things up? That’s what the math part is for (which we’ll get into next week)! Once they have an idea of what is happening in the data and need to make final conclusions, researchers can do statistical analyses and tests. That way, everyone can agree on what the experiment can tell us objectively rather than through opinions and just what our eyes see!

We have to use visual observations and our intuitions about experimental results together with statistical analysis to make sound conclusions.


So, here’s your job for the week: before we do the stats, what do you see in the data? What questions seem to be answered, and which aren’t? Is there anything you find confusing or surprising?

In the comments section below, tell us about it! What should we be thinking about from our first look at this data? Is there anything cool? Be the scientist and make some observations!

Next week, we’ll go over what we might be looking for when we start doing our statistical analysis, and what that will look like!

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Language Fun: Mayan Hieroglyphs

There are all kinds of different ways that languages can be written!

We have an alphabet in English (which you’re using to read right now!), but have you ever heard of or seen hieroglyphs? Ancient Egypt has perhaps the most famous example of this, but they aren’t the only ones.


Want to learn how to read a Mayan Hieroglyph? Now’s your chance with this week’s Language Fun game!

What did you think? Tell us about how it went in the comments! Did you follow the links at the end to try to learn more?

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Week 7: Running the Experiment

It’s finally time to run our study!
After all of the hard work our Citizen Scientists have put into creating, designing, and testing their ideas, Week 7 is the moment of truth.

Before we get into that, even though we’re nearing the end of this summer’s project, we’d still love to have you sign up officially if you want to. That way you can be on our leaderboard or help us better understand how to do citizen science better! Feel free to keep voting or commenting without signing up, though – we’re just happy you’re here!

This week, we’re sending off the study for real people to participate in! We also want to give you one last chance to make some guesses about what might happen in our experiment.



What do you think we’re going to see? We have a couple questions for you about what the answer might be to our research question, and about how different people might do differently in the study’s task. Thinking about these kinds of things is fun, and it helps us be more aware of our biases before we start evaluating our results.


Last week, we asked you to help us test out our stimuli and see if you could name our pictures. Everyone did great, and most of the pictures were consistently named! Thanks again to everyone who tried it out. There were only a couple that had some different answers, like DVD vs. CD and hippo vs. hippopotamus. The nice thing is that these responses still show that the pictures are recognizable, and as long as participants in the study can prove they remembered a picture, it doesn’t really matter what they’re called exactly!


Now, let’s go over how we are going to get our data in our final experiment.

All of this is going to be run through a service called Prolific. We just have to make an online study, and then we can send it off to be taken by participants all over. We can specify the kinds of people who should be taking our study, like if we need only adults or those who speak a certain language, and we can pay them a fair rate for their time. If you have any friends or family who you think might want to take our survey, they can make an account on Prolific and be in all kinds of studies too!

That being said, though we’re sure you’re curious, none of our Citizen Scientists can be in the study you helped us to design. It’s for the same reason that none of us on the BLNDIY research team can participate, either: we know too much! Even if you wouldn’t mean to, taking the survey when you know the goals of the study can influence our results, especially if you expect or desire a certain outcome. That’s why we only tell participants so much about an experiment until after they finish it. We only want them to know enough to be able to do the task we’re giving them and feel safe while doing so. If we said beforehand exactly what we’re looking for, then it wouldn’t be a controlled experiment like good science should be!

Bias is something that’s really important to consider in experiments, which we’ve seen in a couple steps of our experimental designing that we’ve done here. A really cool example of this is the Clever Hans effect. Here’s a short video telling the story of a horse that appeared to be able to do math. It turns out that the humans testing Hans accidentally gave him clues to answering arithmetic problems without even knowing they were affecting the results at all… (And here’s an extra article if you want to know more!)


That’s it for this week!

It won’t be long before we have some real data for your experiment.

Again, CLICK HERE to make some bets on what might happen in the experiment.

And don’t forget our Language Fun section on the site! If you haven’t checked it out yet, we have all kinds of cool games and quizzes that are all about language science.


Are you excited? Share what you think is gonna happen down in the comments! Will people remember more words in the out-loud condition, the in-your-head condition, or will it be about the same between the two? If there’s a difference, how big of a difference will it be?

Next week, we’ll have some preliminary results to share, and we’ll talk about making qualitative observations of our data.

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Language Fun: Root Out the Word

Have you ever heard of an affix before?

Affixes are small bits of meaning that are an important part of how words are made! Prefixes are a type of affix and can turn somebody who is ‘likeable’ into someone much less pleasant to be around: ‘unlikeable’. Suffixes are another kind of affix, which might help an adjective like ‘sad’ describe other words as an adverb: ‘sadly‘.


Can you find the affixes? Try our ‘Root Out the Word’ game!

How did it go? Do you have any favorite roots or affixes? Tell us about it in the comments!

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**

Week 6: Stimuli

Welcome back!

First, if you want to help us out even more, make sure you check out our sign up page and everything we’ve got posted over there. Feel free to vote and comment without officially signing up, though!

This week, we need your help to pick the best stimuli for our experiment! Be sure to check out the results of last week’s vote down below, too.


CLICK HERE to help with the stimuli! Read below to find out why we’re doing this.

The pictures we are using are from the Bank of Standardized stimuli (BOSS) pictorial dataset, a set of more than 1400 pictures created by Dr. Martin Brodeur (as well as Dr. Martin Lepage, Dr. Katherine Guérard, and many others). The dataset has been designed and tested to be standard across a variety of factors that are published in several publications available alongside the dataset. It was designed as a research tool to be freely shared (with credit given to the researchers responsible) for cognitive and psycholinguistic research. Check out their website, where you can find the entire set as well as the various studies that have been published!

We’ve picked a subset of these pictures to use in our experiment and we need your help testing them in a similar way to the testing done previously with the BOSS dataset. In our experiment, we’re asking participants to remember a list of pictures, so we need to ensure that a typical participant would know the name of the object. We’re going to give you 10 pictures and ask you to just write down what they are. It might seem simple, but this work is actually very important! We need to know that participants weren’t able to write a picture down because they didn’t remember it and not because they didn’t know what it was.  This process is called norming. The BOSS dataset is such an awesome tool because other researchers have already done this for many features that could potentially impact how participants do, such as color, size of the object, brightness, and perspective. Without a dataset like this, we and many other researchers would have had to use (likely much simpler) drawings instead of high-quality photographs.


Now for the results from last week’s vote on experimental design!

Sounds like you guys were pretty much in agreement that we should go with easy words and 30 images for our study. Our participants should be grateful that they don’t have to remember the hard words, and 30 pictures will be a happy medium. That’s what we’ll do then!


We did have some questions last week that are definitely worth answering, too.

Eshmoney said: I thought of another question about the study design: Are the participants going to get all of the pictures at once or are they going to get them one at a time?

This is an important thing to think about with our experimental design, absolutely! The plan is to give each picture one at a time for one second. One at a time should help keep people from getting overwhelmed or running out of time to scan through all of the pictures, while one second is long enough to fully see the picture without giving too much time to work on memorizing them. How else might it change our design or results if we gave the pictures all at once or for a different amount of time instead?


PumpkinPie54 asked: With easier words like ”dog”, would a person picture the image they were given, or their own dog? Since “dog” is a pretty relatable word, and since most people have dogs, and most people don’t have inner dialogues, wouldn’t they remember the picture… as another picture? Sorry if that sounded confusing, and if i am getting a bit off-topic.

That’s an interesting point! When we try to memorize things, we often try to link them to our own experiences and ideas. If we think about our own dogs when trying to remember the word “dog”, it probably would be easier than if we didn’t have a dog and thus didn’t link it to what we’re memorizing. Hopefully, though, by giving 30 images in a row for only a second, participants won’t really have enough time to build any kind of tricks or mnemonics that will influence our results. And by giving pictures rather than just popping the written word “dog” up on the screen, we’re helping to suggest that particular dog in the image to people rather than letting them come up with whatever image they want to!


That’s it for this week!

Here’s the link one more time for the stimuli testing we’d like you to do.

What did you think about our pictures? Did you look through the BOSS dataset website and find anything cool we should know about? Tell us about it down in the comments! Next week, we’ll start collecting the data and talk about some pitfalls and considerations when running an experiment.

**Although we moderate every comment before it gets posted, please remember to be kind to others and mindful of your personal information before you post here!**